Papers
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: Do manifold-aware distance metrics improve the robustness of dense retrieval systems in low-resource or cross-lingual settings, as measured by MRR@10 on multilingual benchmarks like XQuAD or MLQA. Wikipedia…
Abstract: This report synthesises findings from 3 peer-reviewed papers addressing the following research question: Do hybrid retrieval systems combining manifold-aware dense retrieval with sparse retrieval (e.g., BM25) improve robustness against adversarial query perturbations in legal domain QA benchmarks like. Large Language…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the inference efficiency of manifold-aware dense retrieval models compare to baseline DPR models on large-scale passage retrieval tasks (e.g., MS MARCO) when using approximate nearest. The deployment of…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: To what extent do manifold-aware dense retrieval models outperform multi-representation architectures in Recall@1000 on out-of-distribution biomedical QA benchmarks like BioASQ or MedQA when. Brain-Computer…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How do manifold-aware distance metrics improve robustness in cross-domain retrieval tasks (e.g., FEVER vs. TriviaQA) when compared to Euclidean/cosine-based retrievers, as measured by exact match. Point clouds…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What are the computational efficiency trade-offs between manifold-aware DPR models (e.g., MA-DPR) and traditional multilingual retrieval models (e.g., mDPR) on large-scale benchmarks like BEIR,. Large Language…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the integration of manifold-aware distance metrics (e.g., MA-DPR) with multilingual models like LaBSE affect cross-lingual retrieval performance on benchmarks like MLQA, compared to cosine. Cross-lingual…
Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: What is the impact of Vendi-RAG's diversity-quality trade-off on inference latency and token throughput during code generation tasks on the MBPP benchmark. Abstract The rapid evolution of large language models…
Abstract: This report synthesises findings from 5 peer-reviewed papers addressing the following research question: How does Vendi-RAG's iterative diversity optimization affect pass@10 and pass@100 metrics on HumanEval compared to standard dense retrieval baselines. As Large Language Models (LLMs) become increasingly integrated…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the computational efficiency trade-off when using manifold-aware distance metrics in dense retrieval systems for HotpotQA, and how does it compare to the efficiency of standard DPR baselines. Unlike…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How robust is Vendi-RAG's joint optimization process to variations in document redundancy when evaluated on the Natural Questions benchmark, and what trade-offs exist between answer quality and. Large Language…
Abstract: This report synthesises findings from 5 peer-reviewed papers addressing the following research question: How does the Vendi-RAG framework's iterative optimization impact latency and throughput scalability when applied to the HotpotQA benchmark compared to traditional RAG systems. In this paper, we introduce a new…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How do semantics-guided adversarial perturbations affect the pass@k scores of multimodal code generation models on the HumanEval-X benchmark across diverse programming languages. Unlike previous studies on the…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does adversarial training affect the calibration error of multimodal trajectory prediction models on the Waymo Open Dataset compared to standard maximum likelihood estimation. We introduce Argoverse 2 (AV2) -…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the chain-based retrieval accuracy of Llama-3-8B-128K compare to Qwen-8B and Mistral-8B on HotPotQA when varying the maximum context length from 32K to 128K. In recent years, the input context sizes of…
Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: What is the impact of varying the number of hops on the robustness of multi-hop retrieval for Llama-3-8B-128K when evaluated on adversarial examples from HotPotQA and SQuAD. Selective state-space models (SSMs)…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the trade-off between retrieval accuracy and latency vary when comparing Tree of Reviews versus chain-based retrieval for Llama-3-8B-128K on SQuAD and HotPotQA when using different embedding. Abstract…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the impact of variational mixture of experts architectures on inference latency and throughput for multimodal relation extraction compared to dense graph neural networks. Abstract Data scarcity is a major…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: Does integrating structural graph priors improve robustness against noisy image-text pairs in zero-shot multimodal information extraction compared to pure attention-based models. In the last few years, the deep…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does graph neural network-based multimodal fusion compare to transformer attention mechanisms in zero-shot entity typing accuracy on social media benchmarks. Deep Residual Networks have recently been shown to…
Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does LongNav-R1's performance scale with increasing instruction ambiguity complexity on the ValHouse3D benchmark compared to single-turn VLA policies in terms of trajectory deviation and success. In the vision…
Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What is the efficiency gain of LongNav-R1 compared to single-turn VLA policies in terms of inference time and compute resources on the RxR-CE benchmark. Embodied AI is widely recognized as a cornerstone of…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the multi-turn RL framework of LongNav-R1 perform on the Room-to-Room (R2R) benchmark compared to single-turn VLA policies in terms of success rate and trajectory deviation metrics. We present Habitat, a…
Abstract: This report synthesises findings from 3 peer-reviewed papers addressing the following research question: What is the efficiency trade-off between Oracle-RLAIF and RLHF in terms of inference latency and memory usage when processing noisy spoken queries on the SQuTR benchmark. While large-scale unsupervised language…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the Tree of Reviews framework perform on cross-domain multi-hop reasoning tasks like TriviaQA when compared to linear chain retrieval methods in terms of F1 score and retrieval precision. Large language…