Assignee Research: Index of Papers

Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 8265 papers; mean review score 5.72/10; 2247 Zenodo DOIs. Verified contributions (Gate 2: formal proof or sandbox reproduction): 148. 78 claims falsified by the pipeline (see falsification record). 169 published AI claims under field audit; 92 contested by the literature itself (see audit ledger). 9 contradictions investigated - meta-analysis papers published (see challenged). What does this mean?

Results 7226–7250 of 8265 entries

Papers

[1040]

Fine-Tuned Llama-3.1-8B, Mistral-7B, and Qwen3-8B Generalization in Romanized Low-Resource Languages

30 May 2026. Score: 5.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the performance of fine-tuned Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B on Romanized Nepali generalize to other low-resource language variants (e.g., Romanized Hindi or Marathi) when. Romanized Nepali,…

[1039]

Taxonomy-Aligned Fine-Tuning of Codestral for Zero-Shot Vulnerability Repair

30 May 2026. Score: 1.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does fine-tuning Codestral on taxonomy-aligned vulnerability datasets affect zero-shot repair success rates on Big-Vul compared to fine-tuning on general code corpora. Within the realm of software…

[1038]

Dataset Alignment Effects on Codestral False Positive Rates in SWCC Vulnerability Severity Prediction

30 May 2026. Score: 3.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the impact of dataset alignment on the false positive rate of Codestral when evaluating vulnerability severity predictions on the SWCC benchmark. Static Application Security Testing (SAST) tools play a…

[1037]

Vendi-RAG Diversity Optimization Enhances FLAN-T5-xl Robustness to Syntactic Distractors

30 May 2026. Score: 3.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: Does optimizing the diversity-weight in Vendi-RAG improve FLAN-T5-xl robustness against syntactic distractors in HANS compared to standard relevance-based RAG baselines. Retrieval-augmented generation (RAG)…

[1036]

Multimodal Input Integration Enhances DeepSeek R1 Vulnerability Repair Performance

30 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the integration of multimodal inputs like AST and control flow graphs affect the vulnerability repair capabilities of DeepSeek R1 compared to Codestral, when evaluated on the Big-Vul dataset. The…

[1035]

Vendi-RAG Diversity-Weight Impact on FLAN-T5-XL Latency and Throughput in ANLI and HANS

30 May 2026. Score: 3.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What is the effect of varying Vendi-RAG's diversity-weight on FLAN-T5-xl inference latency and token throughput when evaluated on ANLI and HANS datasets. LLM inference is still evaluated mainly as a model or…

[1034]

Pass@k Degradation in Code Models from HumanEval to HumanEval Pro

30 May 2026. Score: 5.27/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the pass@k metric degrade for code generation models between 1B and 10B parameters when transitioning from standard HumanEval to self-invoking HumanEval Pro tasks under fixed token budgets. We introduce…

[1033]

Vendi-RAG Diversity-Weight Effects on FLAN-T5-xl Performance in Adversarial NLI Benchmarks

30 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the diversity-weight parameter in Vendi-RAG impact FLAN-T5-xl accuracy and F1-score on the ANLI and HANS adversarial benchmarks. State-of-the-art few-shot learning (FSL) methods leverage prompt-based…

[1032]

Vendi-RAG Diversity-Weight Tuning for Factuality-Coherence Trade-offs on ELI5

30 May 2026. Score: 6.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does varying the diversity-weight parameter in Vendi-RAG affect the trade-off between factuality and coherence scores on the ELI5 dataset compared to standard RAG baselines. Current evaluation methods for…

[1031]

BM25 and Dense Retriever Hybridization in RAG Pipelines: Latency and Throughput Trade-offs

30 May 2026. Score: 3.07/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What is the effect of combining BM25 and dense retrievers on the inference latency and throughput of RAG pipelines in production environments. Retrieval-Augmented Generation (RAG) enhances Large Language Models…

[1030]

Manifold-Aware vs. Traditional Distance Metrics in Long-Context Passage Retrieval

30 May 2026. Score: 8.00/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the performance of manifold-aware distance metrics compare to traditional distance metrics (cosine, Euclidean) in dense passage retrieval when evaluated on long-context benchmarks like. Dense Passage…

[1029]

Manifold-Aware Dense Retrieval Robustness Under Out-of-Distribution Query Shifts

30 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: Do manifold-aware dense retrieval models demonstrate improved robustness and stability in Recall@1000 scores under out-of-distribution query shifts in biomedical or legal domain QA benchmarks. Dense Passage…

[1028]

Manifold-Aware Distance Metrics in Cross-Domain and Cross-Lingual Retrieval Performance

30 May 2026. Score: 4.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How do manifold-aware distance metrics perform in cross-domain and cross-lingual retrieval tasks (e.g., FEVER, MLQA) compared to multilingual models like mDPR or LaBSE, particularly when evaluated on. Dense…

[1027]

Manifold-Aware Metrics Enhance Recall in Dense Retrieval for Multi-Hop Reasoning

30 May 2026. Score: 5.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does replacing Euclidean distance with manifold-aware metrics in dense retrieval affect Recall@1000 performance on multi-hop reasoning datasets like HotpotQA compared to standard DPR baselines. Dense Passage…

[1026]

Manifold-Aware Distance Metrics vs. Standard Metrics in Large-Scale Retrieval Systems

30 May 2026. Score: 7.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the computational overhead and throughput impact of manifold-aware distance metrics (MA-DPR) compared to standard distance metrics in large-scale retrieval systems, when scaled to billions of. Dense…

[1025]

Vendi-RAG Diversity-Quality Trade-offs and Pass@k Performance on Code Generation Benchmarks

30 May 2026. Score: 3.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does Vendi-RAG's diversity-quality trade-off impact pass@k metrics on the HumanEval and MBPP code generation benchmarks compared to dense retrieval baselines. Retrieval-augmented generation (RAG) enhances…

[1024]

Vendi-RAG Computational Overhead in Multi-Hop QA Benchmark Performance

30 May 2026. Score: 4.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the computational overhead of Vendi-RAG's iterative joint optimization process compared to traditional RAG, measured in terms of latency and throughput on the MS MARCO passage ranking. Retrieval-augmented…

[1023]

Semantics-Guided Adversarial Perturbations in Cross-Domain Code Generation

30 May 2026. Score: 5.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What is the impact of semantics-guided adversarial perturbations on the code generation success rates of multimodal models when evaluated on cross-domain programming tasks. Adversarial examples reveal the blind…

[1022]

Adversarial Training Effects on Probabilistic Occupancy Grid Calibration in Urban Driving Models

30 May 2026. Score: 5.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 17 peer-reviewed papers addressing the following research question: What is the impact of adversarial training on the calibration of probabilistic occupancy grid predictions in urban autonomous driving models evaluated on the Waymo Open Dataset. Being able to generate realistic…

[1021]

Token Scheduling Strategies in Sparse vs. Dense Multimodal Models on OK-VQA

30 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20468477

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What is the impact of token scheduling strategies on the inference throughput and alignment scores of sparse multimodal models versus dense architectures on OK-VQA. Recent advancements in Multimodal Large Language…

[1020]

Tree of Reviews vs. Chain-Based Retrieval Latency on MuSiQue with Llama-3-8B

30 May 2026. Score: 8.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20468466

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the inference latency of Tree of Reviews compare to chain-based retrieval methods on MuSiQue when scaling retrieval hops from 2 to 4 on Llama-3-8B. Compared to black-box neural networks, logic rules…

[1019]

Llama-3-8B-128K Multi-Hop Retrieval Performance Against Mistral-8B and Qwen-8B

30 May 2026. Score: 8.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20468458

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the performance of Llama-3-8B-128K compare to other 8B-parameter models like Mistral-8B or Qwen-8B in multi-hop retrieval accuracy on HotPotQA and MuSiQue benchmarks when using chain-based. Prompt…

[1018]

Tree of Reviews and Chain-Based Retrieval Trade-offs in Multi-Hop QA for Llama-3-8B-128K

30 May 2026. Score: 4.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What is the impact of varying the number of hops on the trade-off between retrieval accuracy and latency in Tree of Reviews versus chain-based retrieval for Llama-3-8B-128K on SQuAD and HotPotQA.…

[1017]

Graph Neural Networks in Multimodal Fusion for Zero-Shot Long-Horizon Navigation

30 May 2026. Score: 3.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the integration of graph neural networks in multimodal fusion architectures impact zero-shot reasoning accuracy on long-horizon navigation benchmarks compared to attention-based models. Multimodal…

[1016]

LongNav-R1 Robustness to Instruction Ambiguity in RxR-CE Benchmark Trajectories

30 May 2026. Score: 6.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the robustness of LongNav-R1 to instruction ambiguity on the RxR-CE benchmark compare to standard single-turn VLA policies in terms of trajectory deviation metrics. This paper develops LongNav-R1, an…

« Prev 1 … 288 289 290 291 292 … 331 Next »