Assignee Research: Index of Papers

Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 8290 papers; mean review score 5.73/10; 2267 Zenodo DOIs. Verified contributions (Gate 2: formal proof or sandbox reproduction): 150. 87 claims falsified by the pipeline (see falsification record). 169 published AI claims under field audit; 92 contested by the literature itself (see audit ledger). 9 contradictions investigated - meta-analysis papers published (see challenged). What does this mean?

Results 7426–7450 of 8290 entries

Papers

[865]

Semantics-Guided Adversarial Training for Trajectory Prediction Generalization

30 May 2026. Score: 5.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the impact of semantics-guided adversarial training on the generalization gap between in-domain and out-of-domain trajectory prediction tasks. Predicting the trajectories of surrounding objects is a…

[864]

Adversarially Trained Trajectory Prediction Models: Latency and Accuracy Trade-offs in Autonomous Driving

30 May 2026. Score: 5.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How do adversarially trained trajectory prediction models compare in inference latency and accuracy trade-offs when evaluated on standard autonomous driving planning benchmarks. We introduce a motion forecasting…

[863]

Alignment-Weighted DPO Robustness Scaling Across LLaMA-2 Model Variants

30 May 2026. Score: 4.00/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the robustness of alignment-weighted DPO scale across LLaMA-2 variants (7B, 13B, 70B) on adversarial TruthfulQA prompts compared to standard DPO alignment. Adversarial robustness of deep learning models…

[862]

Alignment-Weighted DPO Latency and Performance in Code Generation Benchmarks

30 May 2026. Score: 5.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What is the inference latency impact of applying alignment-weighted DPO on code generation tasks using HumanEval and MBPP benchmarks. We introduce self-invoking code generation, a new task designed to evaluate the…

[861]

Sparse Multimodal Model Efficiency and Alignment Trade-offs on VQAv2 and OK-VQA

30 May 2026. Score: 2.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: Does the inference efficiency of sparse multimodal models with varying numbers of experts improve with higher alignment scores on VQAv2 and OK-VQA, and how does this trade-off compare to dense models. Sparse…

[860]

Sparse Multimodal Model Alignment and Performance on OK-VQA vs. Dense Baselines

30 May 2026. Score: 3.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the alignment score (e.g., via RLHF or DPO) of sparse multimodal models with varying numbers of experts correlate with their performance on the OK-VQA benchmark compared to dense models. Background:…

[859]

Tree of Reviews vs. Chain-Based Retrieval: Latency-Accuracy Trade-offs in Multi-Hop QA for Llama-3-8B-128K

30 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the trade-off between retrieval latency and answer accuracy when scaling the number of hops in Tree of Reviews vs. chain-based retrieval for Llama-3-8B-128K on the HotPotQA and MuSiQue.…

[858]

Tree-Based Retrieval Stability in Multi-Hop Question Answering with Llama-3-8B-128K

30 May 2026. Score: 5.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the impact of varying the number of retrieval hops (e.g., 2-hop vs. 3-hop) on the F1 score stability of the Tree of Reviews framework compared to chain-based retrieval in Llama-3-8B-128K when. Multi-hop…

[857]

LongNav-R1 Cross-Validation Performance Across Multimodal Input Modalities

30 May 2026. Score: 4.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the cross-validation performance of LongNav-R1 vary across different multimodal input modalities when processing long-horizon navigation tasks. Robot vision has greatly benefited from advancements in…

[856]

LongNav-R1 and Single-Turn VLA Inference Latency on RxR-CE Benchmark

30 May 2026. Score: 4.33/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the inference latency of LongNav-R1 compare to single-turn VLA policies when evaluated on the RxR-CE navigation benchmark using standard desktop GPUs. This paper develops LongNav-R1, an end-to-end…

[855]

Tree of Reviews vs. Tree-Based Retrieval Methods in MultiHopQA for Llama-3-8B

30 May 2026. Score: 3.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the Tree of Reviews retrieval framework compare to other tree-based retrieval methods in terms of accuracy and computational overhead when applied to Llama-3-8B models on the MultiHopQA. Multi-hop…

[854]

Retrieval-Augmentation Context Effects on Llama-3-8B-128K Accuracy in Jamendo-MT-QA

30 May 2026. Score: 6.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the impact of varying retrieval-augmentation contexts (e.g., different music metadata sources, retrieval depths) on Llama-3-8B-128K's response accuracy for fact-based versus interpretive. Recent work on…

[853]

FAIR-RAG and FARSIQA: Enhancing Llama-3-8B-128K Consistency in Multi-Track Music QA

30 May 2026. Score: 6.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: Can retrieval-augmented generation (RAG) improve the consistency of Llama-3-8B-128K's responses in multi-track comparative music QA when evaluated using a novel semantic consistency metric across. The advent of…

[852]

Oracle-RLAIF Sample Efficiency vs. Supervised Fine-Tuning on RxR-CE nDTW Metrics

30 May 2026. Score: 4.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does Oracle-RLAIF's sample efficiency compare to traditional supervised fine-tuning when evaluated on the RxR-CE benchmark's nDTW score across different training compute budgets. Recent advances in large…

[851]

Llama-3-8B-128K Performance on Jamendo-MT-QA Against Open-Source LLMs

30 May 2026. Score: 3.33/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the performance of Llama-3-8B-128K compare to other open-source LLMs (e.g., Falcon-40B, Mistral-7B) on Jamendo-MT-QA when evaluated using both human annotations and automated metrics like. Recently,…

[850]

Oracle-RLAIF Cross-Lingual Generalization from English to Multilingual Preference Data

30 May 2026. Score: 2.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does Oracle-RLAIF maintain cross-lingual generalization capabilities on RxR-CE when scaling from English-only pretraining to multilingual human preference data. To democratize large language models (LLMs) to…

[849]

Computational Efficiency of VELMA, Flamingo, and PaLI in Vision-Language Benchmarks

30 May 2026. Score: 8.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20466317

Abstract: This report synthesises findings from 2 peer-reviewed papers addressing the following research question: What is the computational efficiency (inference latency, FLOPs, or energy consumption) of VELMA compared to Flamingo and PaLI when deployed on standard vision-language benchmarks like VQA-v2 or. We explore…

[848]

Multi-Turn Reinforcement Learning in LongNav-R1: Sample Efficiency and Convergence on R2R and RxR Datasets

30 May 2026. Score: 5.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How does the multi-turn reinforcement learning approach in LongNav-R1 compare to other state-of-the-art RL-based navigation models in terms of sample efficiency and convergence speed on the R2R. We introduce…

[847]

Instruction Complexity Effects on Embodied-R1 and VLA Path Completion in ALFRED

30 May 2026. Score: 7.57/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20466315

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: What is the impact of instruction complexity on the path completion rate of Embodied-R1 compared to 7B and 13B VLAs when evaluated on the ALFRED benchmark for embodied task completion. Abstract The rapid evolution…

[846]

Performance Gaps Between 7B and 13B Vision-Language Models in Cross-Domain Object Grounding

30 May 2026. Score: 3.77/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: Does the performance gap between 7B and 13B VLAs in object grounding persist when evaluated on cross-domain vision-language benchmarks such as LVIS or COCO-Text. We introduce InternVL 2.5, an advanced multimodal…

[845]

Inference Efficiency and Memory Footprint of 3B, 7B, and 13B VLMs in Embodied-R1 LongNav-R1 Tasks

30 May 2026. Score: 3.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 3 peer-reviewed papers addressing the following research question: How does the 3B VLM in Embodied-R1 compare to 7B and 13B VLAs in terms of inference efficiency and memory footprint when evaluated on LongNav-R1 with R2R-CE instructions of varying complexity. The field of fluid…

[844]

13B vs. 7B VLA Models in Zero-Shot Cross-Dataset Generalization on R2R-CE

30 May 2026. Score: 7.27/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 1 peer-reviewed paper addressing the following research question: Can 13B VLA models achieve better zero-shot cross-dataset generalization than 7B models on the R2R-CE benchmark when augmented with external multimodal pretraining data. The proliferation of Large Language Models…

[843]

13B vs. 7B VLA Models on R2R-CE Under Noisy and Adversarial Inputs

30 May 2026. Score: 7.23/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How does the performance of 13B VLA models compare to 7B models on the R2R-CE benchmark when evaluated with multi-stage navigation tasks under noisy or adversarial linguistic inputs. Recently, Multimodal Large…

[842]

Instruction Complexity and Grounding Accuracy in 7B vs. 13B Vision-Language-Action Models

30 May 2026. Score: 8.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20466294

Abstract: This report synthesises findings from 3 peer-reviewed papers addressing the following research question: What is the correlation between instruction complexity in LongNav-R1 and the grounding accuracy of 7B vs. 13B VLA models, as measured by entity detection F1 scores on R2R-CE validation splits. Multimodal datasets…

[841]

Alignment Techniques and Robustness in Sparse MoE Models for Code Generation

30 May 2026. Score: 7.40/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the effects of alignment techniques (e.g., RLHF, constitutional AI) on the robustness of sparse MoE models in self-invoking code generation tasks, measured by accuracy on adversarial. Large Language…

« Prev 1 … 296 297 298 299 300 … 332 Next »