Index  |  Benchmarks  |  Mathematics  |  Graph  |  About
Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 4727 papers; mean review score 5.83/10; 1462 Zenodo DOIs.
Results 3851–3875 of 4727 entries

Papers

[877]
30 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the impact of combining sparse and dense retrieval methods (hybrid retrieval) on the ROUGE-L performance of Vendi-RAG on the ELI5 dataset compared to using each method individually. Large Language Models…

[876]
30 May 2026. Score: 3.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the impact of varying the number of retrieval rounds (1 to 10) in Vendi-RAG on the accuracy-throughput trade-off when applied to the GSM8K benchmark with FLAN-T5-xxl. Retrieval-augmented generation (RAG)…

[875]
30 May 2026. Score: 5.33/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: To what extent does Vendi-RAG's diversity-aware retrieval improve cross-domain generalization performance on the ELI5 benchmark, measured by the accuracy gap between in-domain and out-of-domain.…

[874]
30 May 2026. Score: 3.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 6 peer-reviewed papers addressing the following research question: What is the impact of Vendi-RAG's diversity-aware retrieval on inference efficiency and computational overhead compared to traditional BM25 and dense retrieval methods when processing out-of-domain. The advent of…

[873]
30 May 2026. Score: 4.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How do single representation dense retrieval models with manifold-aware distance metrics compare to multi-representation models in terms of Recall@1000 on complex reasoning tasks in the ARC-Challenge. Dense…

[872]
30 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the performance of manifold-aware distance metrics in dense passage retrieval scale with increasing context window sizes beyond 512 tokens on Natural Questions and HotpotQA benchmarks. Dense Passage…

[871]
30 May 2026. Score: 5.33/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the hierarchical query mechanism in Vendi-RAG affect downstream task performance on code generation accuracy compared to standard RAG architectures. Retrieval-augmented generation (RAG) enhances large…

[870]
30 May 2026. Score: 5.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the impact of corpus size on answer generation accuracy for Vendi-RAG versus traditional RAG when measured by exact match scores on NaturalQuestions benchmark. Retrieval-augmented generation (RAG)…

[869]
30 May 2026. Score: 4.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the inference latency of Contriever and DPR encoders scale with increasing context window sizes up to 2048 tokens on the SQuAD 2.0 benchmark. Open-domain question answering relies on efficient passage…

[868]
30 May 2026. Score: 6.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the impact of adding adversarial noise during training on the retrieval performance of DPR and Contriever encoders on the MSCOCO captioning benchmark. Dense retrieval is becoming one of the standard…

[867]
30 May 2026. Score: 2.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the retrieval accuracy of Contriever and DPR encoders compare on the TriviaQA benchmark when the context window size is increased to 4096 tokens. The advent of contextualised language models has brought…

[866]
30 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How robust is MA-DPR to noisy or adversarial query-passage pairs compared to standard DPR, as evaluated on adversarial benchmark datasets like HardNQ or Adversarial TriviaQA, using precision@k and. Following the…

[865]
30 May 2026. Score: 5.83/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the impact of semantics-guided adversarial training on the generalization gap between in-domain and out-of-domain trajectory prediction tasks. Predicting the trajectories of surrounding objects is a…

[864]
30 May 2026. Score: 5.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How do adversarially trained trajectory prediction models compare in inference latency and accuracy trade-offs when evaluated on standard autonomous driving planning benchmarks. We introduce a motion forecasting…

[863]
30 May 2026. Score: 4.00/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the robustness of alignment-weighted DPO scale across LLaMA-2 variants (7B, 13B, 70B) on adversarial TruthfulQA prompts compared to standard DPO alignment. Adversarial robustness of deep learning models…

[862]
30 May 2026. Score: 5.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What is the inference latency impact of applying alignment-weighted DPO on code generation tasks using HumanEval and MBPP benchmarks. We introduce self-invoking code generation, a new task designed to evaluate the…

[861]
30 May 2026. Score: 2.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: Does the inference efficiency of sparse multimodal models with varying numbers of experts improve with higher alignment scores on VQAv2 and OK-VQA, and how does this trade-off compare to dense models. Sparse…

[860]
30 May 2026. Score: 3.83/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the alignment score (e.g., via RLHF or DPO) of sparse multimodal models with varying numbers of experts correlate with their performance on the OK-VQA benchmark compared to dense models. Background:…

[859]
30 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the trade-off between retrieval latency and answer accuracy when scaling the number of hops in Tree of Reviews vs. chain-based retrieval for Llama-3-8B-128K on the HotPotQA and MuSiQue.…

[858]
30 May 2026. Score: 5.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the impact of varying the number of retrieval hops (e.g., 2-hop vs. 3-hop) on the F1 score stability of the Tree of Reviews framework compared to chain-based retrieval in Llama-3-8B-128K when. Multi-hop…

[857]
30 May 2026. Score: 4.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the cross-validation performance of LongNav-R1 vary across different multimodal input modalities when processing long-horizon navigation tasks. Robot vision has greatly benefited from advancements in…

[856]
30 May 2026. Score: 4.33/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the inference latency of LongNav-R1 compare to single-turn VLA policies when evaluated on the RxR-CE navigation benchmark using standard desktop GPUs. This paper develops LongNav-R1, an end-to-end…

[855]
30 May 2026. Score: 3.83/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the Tree of Reviews retrieval framework compare to other tree-based retrieval methods in terms of accuracy and computational overhead when applied to Llama-3-8B models on the MultiHopQA. Multi-hop…

[854]
30 May 2026. Score: 6.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the impact of varying retrieval-augmentation contexts (e.g., different music metadata sources, retrieval depths) on Llama-3-8B-128K's response accuracy for fact-based versus interpretive. Recent work on…

[853]
30 May 2026. Score: 6.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: Can retrieval-augmented generation (RAG) improve the consistency of Llama-3-8B-128K's responses in multi-track comparative music QA when evaluated using a novel semantic consistency metric across. The advent of…

« Prev 1 153 154 155 156 157 190 Next »