Papers
Abstract: This report synthesises findings from 5 peer-reviewed papers addressing the following research question: What is the trade-off between representation accuracy and computational cost when applying metapath context convolutions versus standard message passing in deep heterogeneous graph networks. Heterogeneous graph…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How do adaptive depth techniques in heterogeneous GNNs compare to static-depth baselines in terms of inference latency and memory footprint on large-scale datasets like Reddit. Heterogeneous graph neural networks…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the structural mismatch between equilibrium graph topologies and efficient message-passing schemes impact the reasoning accuracy of LLM-augmented GNNs on synthetic clique-counting benchmarks. Graph…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: Does metapath-based heterogeneous graph learning improve robustness against adversarial code perturbations on the HumanEval benchmark. Heterogeneous graph neural networks (HGNNs) were proposed for representation…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the inference latency overhead of integrating multi-view graph aggregation versus single-view representations in code generation models. Graph Neural Networks (GNNs) have gained significant attention in…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does metapath context convolution in heterogeneous graph neural networks impact code generation pass@k on HumanEval compared to standard message passing. Heterogeneous graph neural networks (HGNNs) were…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What is the impact of different sampling strategies (e.g., stratified, random) on the stability of F1-scores for Llama3, Codestral, and Deepseek R1 when evaluated on code vulnerability detection. Large language…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: To what extent does domain adaptation of the retriever component improve NDCG@20 metrics in cross-domain RAG evaluations between general web queries and specialized scientific documents. Batch normalization (BN)…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the alignment of multilingual models trained on M2QA affect their RankC scores when evaluated on domain-specific versus cross-domain adversarial perturbations. Generalization and robustness to input…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the integration of dense retrievers like MA-DPR versus sparse lexical methods impact the factual consistency scores of RAG systems on the MS MARCO dataset. This paper proposes a Question-Answering (QA)…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the memory overhead of MA-DPR's non-linear manifold approximations compare to Euclidean/cosine distance in DPR when deployed on resource-constrained inference hardware like GPUs or TPUs. Dense Passage…
Abstract: This report synthesises findings from 5 peer-reviewed papers addressing the following research question: How does the RankC metric's performance in assessing cross-lingual robustness vary when applied to multilingual models fine-tuned for specific domains compared to those trained on generalized datasets.…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does fine-tuning multilingual dense retrievers on BEIR-PL impact zero-shot cross-lingual retrieval accuracy compared to SPLADE and BM25 baselines on other low-resource Slavic languages. State-of-the-art…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: Does the synergistic training approach improve robustness to domain shifts in low-resource languages within the TyDi QA benchmark relative to single-task fine-tuning. Massive false rumors emerging along with…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the computational efficiency (inference latency, FLOPs) of manifold-aware DPR models compare to traditional sparse retrievers (e.g., BM25) when evaluated on domain-shifted benchmarks like. Dense Passage…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the choice of domain-specific contrastive loss (e.g., InfoNCE vs. triplet loss) during fine-tuning affect retrieval accuracy on out-of-domain benchmarks like BEIR or NQ for manifold-aware. Dense Passage…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the impact of manifold-aware distance metrics like MA-DPR on the inference efficiency of dense retrievers when scaling to large-scale document collections (e.g., MS MARCO or BEIR benchmarks). Dense…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: To what extent does the use of manifold-aware distance metrics improve retrieval accuracy for out-of-distribution queries in the TriviaQA benchmark, and what is the trade-off in inference latency. Dense Passage…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the performance of MA-DPR on cross-domain OOD queries (e.g., TriviaQA to HotpotQA) compare to other manifold-aware retrieval methods like SimCLR or contrastive learning-based approaches in. Many…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the integration of manifold-aware distance metrics in multi-task dense retrieval models affect model performance on cross-domain benchmarks like BEIR, compared to traditional. Dense Passage Retrieval…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How do multimodal extensions of Llama-2 models compare to text-only versions in self-invoking code generation tasks on HumanEval Pro, measured by pass@1 and pass@k metrics. We introduce self-invoking code…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: What is the comparative performance of retrieval-augmented generation (Vendi-RAG vs. DPR) when evaluated on domain-specific benchmarks (e.g., QuranQA) under adversarial perturbations (e.g., synonym.…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the inference efficiency (measured in tokens per second) of Vendi-RAG with different model sizes (7B vs. 70B) vary when processing noisy retrieval contexts on TriviaQA and WebQuestions,.…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: What is the impact of model size on the robustness of self-invoking code generation in Llama-2 models when evaluated against adversarial perturbations in MBPP Pro benchmarks. Generating high-fidelity and…
Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: What is the comparative robustness of Vendi-RAG (7B vs. 70B) against misspellings in the query when evaluated on open-domain QA benchmarks like TriviaQA and WebQuestions, measured by precision@k and.…