Index  |  Benchmarks  |  Mathematics  |  Graph  |  About
Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 5125 papers; mean review score 5.71/10; 1466 Zenodo DOIs.
Results 3476–3500 of 5125 entries

Papers

[1650]
31 May 2026. Score: 5.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does Tree of Reviews retrieval accuracy compare to chain-based retrieval on noisy SQuAD variants when using Sentence-T5 versus MPNet embeddings for Llama-3-8B-128K. Multi-hop question answering is a…

[1649]
31 May 2026. Score: 4.40/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the graph-augmented attention mechanism in pyramidal multimodal memory models compare to standard Vision-Language Models in terms of inference throughput (tokens per second) on long-horizon. While…

[1648]
31 May 2026. Score: 3.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How do structural graph priors in multimodal architectures affect zero-shot reasoning accuracy on MM-Vet compared to baseline dependency-free models under adversarial text perturbations. Real-time traffic…

[1647]
31 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the scaling behavior of multimodal models differ when using structural graph priors versus pure attention, as measured by BLEU score improvements on the Flickr30k Entities benchmark under. Background:…

[1646]
31 May 2026. Score: 2.33/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How does LongNav-R1's horizon-adaptive multi-turn RL approach perform on cross-domain navigation tasks compared to single-turn VLA models when evaluated on metrics like success rate and turn. This paper develops…

[1645]
31 May 2026. Score: 3.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: What is the impact of increasing the number of conversational turns in MULTIVERSE on the accuracy and coherence of responses from state-of-the-art VLMs. Vision-and-Language Models (VLMs) have shown impressive…

[1644]
31 May 2026. Score: 6.20/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How does the performance of LongNav-R1 on the RxR-CE benchmark compare to other multi-turn reasoning architectures in terms of success rate and response latency. This paper develops LongNav-R1, an end-to-end…

[1643]
31 May 2026. Score: 4.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the impact of varying the number of turns in LongNav-R1 on GPU memory consumption and inference latency during long-horizon task execution on RxR-CE. This paper develops LongNav-R1, an end-to-end…

[1642]
31 May 2026. Score: 5.60/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the multi-turn RL framework of LongNav-R1 compare to single-turn VLA models on the RxR-CE benchmark in terms of success rate and path efficiency in long-horizon navigation tasks. This paper develops…

[1641]
31 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20476438

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the throughput of DPO compare to RLHF when evaluating LLMs on the HEIGER benchmark for adversarial code generation tasks with varying model sizes. Large language models (LLMs) based on transformer…

[1640]
31 May 2026. Score: 6.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How do different reward modeling approaches (implicit vs explicit rewards) influence the final alignment quality and training stability on the SQuTR benchmark. Current large language models (LLMs) often struggle…

[1639]
31 May 2026. Score: 4.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the impact of dataset size on the sample efficiency of DPO versus RLHF methods when aligning LLMs on SQuTR with varying input noise distributions. Aligning large language models (LLMs) with human…

[1638]
31 May 2026. Score: 3.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How does the inclusion of human-annotated rationales in preference data affect the convergence rate of DPO compared to standard RLHF on the SQuTR benchmark across different noise levels. Aligning language models…

[1637]
31 May 2026. Score: 3.73/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What is the impact of different document chunking strategies on the accuracy and retrieval efficiency of Blended RAG when applied to the TriviaQA benchmark with Llama-3-8B. The performance of Retrieval-Augmented…

[1636]
31 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does QLoRA fine-tuning impact the retrieval-augmentation performance of different sized LLMs when evaluated on multi-file code generation benchmarks like HumanEval or MBPP. We introduce self-invoking code…

[1635]
31 May 2026. Score: 4.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the impact of incorporating multi-modal embeddings (e.g., CLIP for images and text) in the hybrid retrieval process of Blended RAG on accuracy for multimodal QA benchmarks like MMBench and.…

[1634]
31 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the performance of Llama-3-8B with MusT-RAG compare to other retrieval-augmented frameworks like Atlas or REALM on the robustness of multi-track music QA benchmarks under varying levels of. Recent…

[1633]
31 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How does the inference efficiency of Patchout Audio Transformers scale with model size when deployed on edge devices, as measured by latency-throughput tradeoffs on the AVE-AV benchmark. The deployment of…

[1632]
31 May 2026. Score: 3.33/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: Do Patchout Audio Transformers trained on large-scale audio-visual datasets outperform smaller-scale audio-only models in cross-domain tasks, as evaluated by transfer learning accuracy on AudioSet. The success of…

[1631]
31 May 2026. Score: 3.07/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does Oracle-RLAIF training affect cross-domain generalization performance on video understanding tasks beyond the MSVD benchmark, measured by zero-shot accuracy on MSR-VTT and DiDeMo datasets. Recently,…

[1630]
31 May 2026. Score: 2.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does Oracle-RLAIF training compare to SFT in terms of video captioning accuracy and inference latency on the MSVD benchmark across 1B, 7B, and 13B parameter models. Recent advancements in large language…

[1629]
31 May 2026. Score: 6.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the robustness of RLAIF-trained non-autoregressive multimodal models compare to SFT baselines in terms of accuracy on adversarial or low-quality video inputs, as measured by COCO-Caption or. We give…

[1628]
31 May 2026. Score: 3.67/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: To what extent does the coarse-to-fine captioning procedure in non-autoregressive models improve CIDEr or BLEU scores on out-of-domain video captioning benchmarks like MSR-VTT compared to in-domain. Current video…

[1627]
31 May 2026. Score: 3.67/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 6 peer-reviewed papers addressing the following research question: How does the inference speed of non-autoregressive multimodal models compare to autoregressive models on the MSR-VTT and MSVD benchmarks when evaluated using frames-per-second (FPS) or latency metrics. It is…

[1626]
31 May 2026. Score: 5.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How does the addition of multimodal capabilities affect Qwen2.5's performance on the HumanEval Pro and MBPP Pro benchmarks compared to text-only models. We introduce self-invoking code generation, a new task…

« Prev 1 138 139 140 141 142 205 Next »