Assignee Research: Index of Papers

Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 8281 papers; mean review score 5.72/10; 2258 Zenodo DOIs. Verified contributions (Gate 2: formal proof or sandbox reproduction): 146. 87 claims falsified by the pipeline (see falsification record). 169 published AI claims under field audit; 92 contested by the literature itself (see audit ledger). 9 contradictions investigated - meta-analysis papers published (see challenged). What does this mean?

Results 7401–7425 of 8281 entries

Papers

[881]

Multimodal vs. Text-Only LLMs in Self-Invoking Code Generation Performance

30 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How do multimodal models (e.g., visual+code) compare to text-only LLMs in solving self-invoking code generation tasks on HumanEval Pro and MBPP Pro, measured by both accuracy and inference latency at. We…

[880]

Performance-Efficiency Scaling in Code Generation Models from 0.5B to 13B Parameters

30 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the Performance-Efficiency Ratio scale with model size (0.5B to 13B parameters) when tested on the original vs. progressively harder versions of HumanEval and MBPP benchmarks under the same. We introduce…

[879]

Vendi-RAG Performance Across Domains: Adaptive Diversity-Weight Tuning in Code and Multimodal Tasks

30 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the performance of Vendi-RAG with adaptive diversity-weight tuning vary across different domains (e.g., code generation with HumanEval vs. multimodal reasoning with MMQA) when measured by. Understanding…

[878]

Vendi-RAG Diversity-Weight Parameter Effects on ELI5 Factuality and Coherence

30 May 2026. Score: 4.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the diversity-weight parameter in Vendi-RAG influence the model's performance on the ELI5 dataset when evaluated using human judgments for factuality and coherence, compared to automated. While humans…

[877]

Hybrid Retrieval Integration in Vendi-RAG: ROUGE-L Performance on ELI5

30 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the impact of combining sparse and dense retrieval methods (hybrid retrieval) on the ROUGE-L performance of Vendi-RAG on the ELI5 dataset compared to using each method individually. Large Language Models…

[876]

Vendi-RAG Retrieval Rounds and Accuracy-Throughput Trade-offs on GSM8K with FLAN-T5-XXL

30 May 2026. Score: 3.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the impact of varying the number of retrieval rounds (1 to 10) in Vendi-RAG on the accuracy-throughput trade-off when applied to the GSM8K benchmark with FLAN-T5-xxl. Retrieval-augmented generation (RAG)…

[875]

Vendi-RAG Diversity-Aware Retrieval Enhances Cross-Domain Generalization on ELI5

30 May 2026. Score: 5.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: To what extent does Vendi-RAG's diversity-aware retrieval improve cross-domain generalization performance on the ELI5 benchmark, measured by the accuracy gap between in-domain and out-of-domain.…

[874]

Vendi-RAG Diversity-Aware Retrieval: Efficiency and Overhead in Out-of-Domain ELI5 Queries

30 May 2026. Score: 3.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 6 peer-reviewed papers addressing the following research question: What is the impact of Vendi-RAG's diversity-aware retrieval on inference efficiency and computational overhead compared to traditional BM25 and dense retrieval methods when processing out-of-domain. The advent of…

[873]

Manifold-Aware Dense Retrieval Models vs. Multi-Representation Approaches on ARC-Challenge and OpenBookQA

30 May 2026. Score: 4.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How do single representation dense retrieval models with manifold-aware distance metrics compare to multi-representation models in terms of Recall@1000 on complex reasoning tasks in the ARC-Challenge. Dense…

[872]

Manifold-Aware Distance Metrics in Dense Retrieval Across Extended Context Windows

30 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the performance of manifold-aware distance metrics in dense passage retrieval scale with increasing context window sizes beyond 512 tokens on Natural Questions and HotpotQA benchmarks. Dense Passage…

[871]

Vendi-RAG Hierarchical Query Mechanism and Code Generation Accuracy Benchmarks

30 May 2026. Score: 5.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the hierarchical query mechanism in Vendi-RAG affect downstream task performance on code generation accuracy compared to standard RAG architectures. Retrieval-augmented generation (RAG) enhances large…

[870]

Vendi-RAG vs. Traditional RAG: Corpus Size Effects on NaturalQuestions Exact Match Accuracy

30 May 2026. Score: 5.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the impact of corpus size on answer generation accuracy for Vendi-RAG versus traditional RAG when measured by exact match scores on NaturalQuestions benchmark. Retrieval-augmented generation (RAG)…

[869]

Contriever and DPR Inference Latency Scaling with Context Windows up to 2048 Tokens

30 May 2026. Score: 4.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the inference latency of Contriever and DPR encoders scale with increasing context window sizes up to 2048 tokens on the SQuAD 2.0 benchmark. Open-domain question answering relies on efficient passage…

[868]

Adversarial Noise Training Effects on DPR and Contriever Retrieval in MSCOCO

30 May 2026. Score: 6.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the impact of adding adversarial noise during training on the retrieval performance of DPR and Contriever encoders on the MSCOCO captioning benchmark. Dense retrieval is becoming one of the standard…

[867]

Contriever and DPR Retrieval Accuracy on TriviaQA with Extended Context Windows

30 May 2026. Score: 2.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the retrieval accuracy of Contriever and DPR encoders compare on the TriviaQA benchmark when the context window size is increased to 4096 tokens. The advent of contextualised language models has brought…

[866]

MA-DPR Robustness Against Noisy and Adversarial Query-Passage Pairs

30 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How robust is MA-DPR to noisy or adversarial query-passage pairs compared to standard DPR, as evaluated on adversarial benchmark datasets like HardNQ or Adversarial TriviaQA, using precision@k and. Following the…

[865]

Semantics-Guided Adversarial Training for Trajectory Prediction Generalization

30 May 2026. Score: 5.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the impact of semantics-guided adversarial training on the generalization gap between in-domain and out-of-domain trajectory prediction tasks. Predicting the trajectories of surrounding objects is a…

[864]

Adversarially Trained Trajectory Prediction Models: Latency and Accuracy Trade-offs in Autonomous Driving

30 May 2026. Score: 5.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How do adversarially trained trajectory prediction models compare in inference latency and accuracy trade-offs when evaluated on standard autonomous driving planning benchmarks. We introduce a motion forecasting…

[863]

Alignment-Weighted DPO Robustness Scaling Across LLaMA-2 Model Variants

30 May 2026. Score: 4.00/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the robustness of alignment-weighted DPO scale across LLaMA-2 variants (7B, 13B, 70B) on adversarial TruthfulQA prompts compared to standard DPO alignment. Adversarial robustness of deep learning models…

[862]

Alignment-Weighted DPO Latency and Performance in Code Generation Benchmarks

30 May 2026. Score: 5.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What is the inference latency impact of applying alignment-weighted DPO on code generation tasks using HumanEval and MBPP benchmarks. We introduce self-invoking code generation, a new task designed to evaluate the…

[861]

Sparse Multimodal Model Efficiency and Alignment Trade-offs on VQAv2 and OK-VQA

30 May 2026. Score: 2.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: Does the inference efficiency of sparse multimodal models with varying numbers of experts improve with higher alignment scores on VQAv2 and OK-VQA, and how does this trade-off compare to dense models. Sparse…

[860]

Sparse Multimodal Model Alignment and Performance on OK-VQA vs. Dense Baselines

30 May 2026. Score: 3.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the alignment score (e.g., via RLHF or DPO) of sparse multimodal models with varying numbers of experts correlate with their performance on the OK-VQA benchmark compared to dense models. Background:…

[859]

Tree of Reviews vs. Chain-Based Retrieval: Latency-Accuracy Trade-offs in Multi-Hop QA for Llama-3-8B-128K

30 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the trade-off between retrieval latency and answer accuracy when scaling the number of hops in Tree of Reviews vs. chain-based retrieval for Llama-3-8B-128K on the HotPotQA and MuSiQue.…

[858]

Tree-Based Retrieval Stability in Multi-Hop Question Answering with Llama-3-8B-128K

30 May 2026. Score: 5.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the impact of varying the number of retrieval hops (e.g., 2-hop vs. 3-hop) on the F1 score stability of the Tree of Reviews framework compared to chain-based retrieval in Llama-3-8B-128K when. Multi-hop…

[857]

LongNav-R1 Cross-Validation Performance Across Multimodal Input Modalities

30 May 2026. Score: 4.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the cross-validation performance of LongNav-R1 vary across different multimodal input modalities when processing long-horizon navigation tasks. Robot vision has greatly benefited from advancements in…

« Prev 1 … 295 296 297 298 299 … 332 Next »