Index |  Research ▾  |  Verification ▾  | About
Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 8290 papers; mean review score 5.73/10; 2267 Zenodo DOIs. Verified contributions (Gate 2: formal proof or sandbox reproduction): 150. 87 claims falsified by the pipeline (see falsification record). 169 published AI claims under field audit; 92 contested by the literature itself (see audit ledger). 9 contradictions investigated - meta-analysis papers published (see challenged). What does this mean?
Results 7426–7450 of 8290 entries

Papers

[865]
30 May 2026. Score: 5.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the impact of semantics-guided adversarial training on the generalization gap between in-domain and out-of-domain trajectory prediction tasks. Predicting the trajectories of surrounding objects is a…

[864]
30 May 2026. Score: 5.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How do adversarially trained trajectory prediction models compare in inference latency and accuracy trade-offs when evaluated on standard autonomous driving planning benchmarks. We introduce a motion forecasting…

[863]
30 May 2026. Score: 4.00/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the robustness of alignment-weighted DPO scale across LLaMA-2 variants (7B, 13B, 70B) on adversarial TruthfulQA prompts compared to standard DPO alignment. Adversarial robustness of deep learning models…

[862]
30 May 2026. Score: 5.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What is the inference latency impact of applying alignment-weighted DPO on code generation tasks using HumanEval and MBPP benchmarks. We introduce self-invoking code generation, a new task designed to evaluate the…

[861]
30 May 2026. Score: 2.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: Does the inference efficiency of sparse multimodal models with varying numbers of experts improve with higher alignment scores on VQAv2 and OK-VQA, and how does this trade-off compare to dense models. Sparse…

[860]
30 May 2026. Score: 3.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the alignment score (e.g., via RLHF or DPO) of sparse multimodal models with varying numbers of experts correlate with their performance on the OK-VQA benchmark compared to dense models. Background:…

[859]
30 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the trade-off between retrieval latency and answer accuracy when scaling the number of hops in Tree of Reviews vs. chain-based retrieval for Llama-3-8B-128K on the HotPotQA and MuSiQue.…

[858]
30 May 2026. Score: 5.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the impact of varying the number of retrieval hops (e.g., 2-hop vs. 3-hop) on the F1 score stability of the Tree of Reviews framework compared to chain-based retrieval in Llama-3-8B-128K when. Multi-hop…

[857]
30 May 2026. Score: 4.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the cross-validation performance of LongNav-R1 vary across different multimodal input modalities when processing long-horizon navigation tasks. Robot vision has greatly benefited from advancements in…

[856]
30 May 2026. Score: 4.33/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the inference latency of LongNav-R1 compare to single-turn VLA policies when evaluated on the RxR-CE navigation benchmark using standard desktop GPUs. This paper develops LongNav-R1, an end-to-end…

[855]
30 May 2026. Score: 3.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the Tree of Reviews retrieval framework compare to other tree-based retrieval methods in terms of accuracy and computational overhead when applied to Llama-3-8B models on the MultiHopQA. Multi-hop…

[854]
30 May 2026. Score: 6.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the impact of varying retrieval-augmentation contexts (e.g., different music metadata sources, retrieval depths) on Llama-3-8B-128K's response accuracy for fact-based versus interpretive. Recent work on…

[853]
30 May 2026. Score: 6.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: Can retrieval-augmented generation (RAG) improve the consistency of Llama-3-8B-128K's responses in multi-track comparative music QA when evaluated using a novel semantic consistency metric across. The advent of…

[852]
30 May 2026. Score: 4.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does Oracle-RLAIF's sample efficiency compare to traditional supervised fine-tuning when evaluated on the RxR-CE benchmark's nDTW score across different training compute budgets. Recent advances in large…

[851]
30 May 2026. Score: 3.33/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the performance of Llama-3-8B-128K compare to other open-source LLMs (e.g., Falcon-40B, Mistral-7B) on Jamendo-MT-QA when evaluated using both human annotations and automated metrics like. Recently,…

[850]
30 May 2026. Score: 2.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does Oracle-RLAIF maintain cross-lingual generalization capabilities on RxR-CE when scaling from English-only pretraining to multilingual human preference data. To democratize large language models (LLMs) to…

[849]
30 May 2026. Score: 8.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20466317

Abstract: This report synthesises findings from 2 peer-reviewed papers addressing the following research question: What is the computational efficiency (inference latency, FLOPs, or energy consumption) of VELMA compared to Flamingo and PaLI when deployed on standard vision-language benchmarks like VQA-v2 or. We explore…

[848]
30 May 2026. Score: 5.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How does the multi-turn reinforcement learning approach in LongNav-R1 compare to other state-of-the-art RL-based navigation models in terms of sample efficiency and convergence speed on the R2R. We introduce…

[847]
30 May 2026. Score: 7.57/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20466315

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: What is the impact of instruction complexity on the path completion rate of Embodied-R1 compared to 7B and 13B VLAs when evaluated on the ALFRED benchmark for embodied task completion. Abstract The rapid evolution…

[846]
30 May 2026. Score: 3.77/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: Does the performance gap between 7B and 13B VLAs in object grounding persist when evaluated on cross-domain vision-language benchmarks such as LVIS or COCO-Text. We introduce InternVL 2.5, an advanced multimodal…

[845]
30 May 2026. Score: 3.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 3 peer-reviewed papers addressing the following research question: How does the 3B VLM in Embodied-R1 compare to 7B and 13B VLAs in terms of inference efficiency and memory footprint when evaluated on LongNav-R1 with R2R-CE instructions of varying complexity. The field of fluid…

[844]
30 May 2026. Score: 7.27/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 1 peer-reviewed paper addressing the following research question: Can 13B VLA models achieve better zero-shot cross-dataset generalization than 7B models on the R2R-CE benchmark when augmented with external multimodal pretraining data. The proliferation of Large Language Models…

[843]
30 May 2026. Score: 7.23/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How does the performance of 13B VLA models compare to 7B models on the R2R-CE benchmark when evaluated with multi-stage navigation tasks under noisy or adversarial linguistic inputs. Recently, Multimodal Large…

[842]
30 May 2026. Score: 8.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20466294

Abstract: This report synthesises findings from 3 peer-reviewed papers addressing the following research question: What is the correlation between instruction complexity in LongNav-R1 and the grounding accuracy of 7B vs. 13B VLA models, as measured by entity detection F1 scores on R2R-CE validation splits. Multimodal datasets…

[841]
30 May 2026. Score: 7.40/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the effects of alignment techniques (e.g., RLHF, constitutional AI) on the robustness of sparse MoE models in self-invoking code generation tasks, measured by accuracy on adversarial. Large Language…

« Prev 1 296 297 298 299 300 332 Next »