Index  |  Benchmarks  |  Mathematics  |  Graph  |  About
Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 5022 papers; mean review score 5.74/10; 1464 Zenodo DOIs.
Results 3576–3600 of 5022 entries

Papers

[1447]
31 May 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: Do manifold-aware embeddings derived from Wikipedia-based semantic relatedness metrics improve cross-lingual dense retrieval performance on XQuAD compared to standard cosine similarity, as measured. Dense Passage…

[1446]
31 May 2026. Score: 6.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does few-shot prompting variation affect SWE-bench pass@k scores in GPT-4o compared to closed-source models like Claude 3. Prompt engineering reduces reasoning mistakes in Large Language Models (LLMs).…

[1445]
31 May 2026. Score: 3.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: Benchmark archaeology: investigate SWE-bench score discrepancy for GPT-4o — reported 7.0\%–83.4\% (spread 76.4pp) across 2 papers. Sources: 'SWE-bench Goes Live!' (7.0\%); 'FeedbackEval: A Benchmark for. The…

[1444]
31 May 2026. Score: 2.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How do multimodal RAG architectures (incorporating text and image retrieval) compare to text-only RAG systems in terms of Recall@1000 and reasoning accuracy on cross-domain benchmarks like JURIS-AQA.…

[1443]
31 May 2026. Score: 2.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What is the impact of domain-specific fine-tuning (e.g., legal domain) on the robustness of RAG models against adversarial attacks compared to general-domain fine-tuning, as measured by Recall@1000. Retrieval…

[1442]
31 May 2026. Score: 7.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: Do manifold-aware distance functions improve cross-domain robustness in code generation models when evaluated on perturbed benchmark suites like HumanEval compared to traditional metric baselines. Code generation…

[1441]
31 May 2026. Score: 7.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the impact of manifold regularization on zero-shot cross-lingual retrieval accuracy for low-resource languages within the BEIR evaluation suite. Zero-shot evaluation of information retrieval (IR) models…

[1440]
31 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the effect of scaling multilingual models with manifold-aware distance metrics (e.g., MA-DPR) on cross-lingual retrieval performance across different language families in the MLQA benchmark,. Dense…

[1439]
31 May 2026. Score: 3.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How do manifold-aware distance metrics (e.g., MA-DPR) improve the robustness of multilingual models like LaBSE against adversarial cross-lingual retrieval attacks on MLQA, as evaluated by accuracy. While…

[1438]
31 May 2026. Score: 6.90/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the performance of Llama3, Codestral, and Deepseek R1 on vulnerability classification in Big-Vul compare to specialized vulnerability detection models like GitHub CodeQL in terms of. Modern software…

[1437]
31 May 2026. Score: 7.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the inference latency of manifold-aware dense retrieval models compare to standard DPR baselines when evaluated on the HotpotQA benchmark. Dense Passage Retrieval (DPR) typically relies on Euclidean or…

[1436]
31 May 2026. Score: 5.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: To what extent does varying the level of semantic overlap in retrieved documents affect the hallucination rates of large language models in retrieval-augmented generation settings. Retrieval-augmented generation…

[1435]
31 May 2026. Score: 3.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does document redundancy in retrieval corpora impact the answer accuracy and latency of joint optimization RAG frameworks on the Natural Questions benchmark. Retrieval-Augmented Generation (RAG) systems…

[1434]
31 May 2026. Score: 5.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the trade-offs between retrieval efficiency and generation quality when applying diversity-aware re-ranking strategies in RAG systems evaluated on open-domain QA tasks. Retrieval-augmented generation…

[1433]
31 May 2026. Score: 3.33/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the effect of adversarial perturbations on the calibration error of transformer-based trajectory forecasters evaluated on the Argoverse 2 Sensor Dataset. Predicting the trajectories of surrounding objects…

[1432]
31 May 2026. Score: 1.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does the multi-granularity capability of M3-Embedding affect retrieval latency and throughput scalability on the HotpotQA benchmark compared to single-granularity dense retrievers. Visual localization is of…

[1431]
31 May 2026. Score: 4.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the fact-chaining accuracy of Llama-3-8B-128K compare to Qwen-8B and Mistral-8B on the BABILong benchmark when context length increases from 32K to 128K. In recent years, the input context sizes of large…

[1430]
31 May 2026. Score: 6.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How do Llama-3-8B-128K, Qwen-8B, and Mistral-8B differ in robustness to irrelevant context noise within the BABILong dataset as the total sequence length scales to 128K. We study the continual pretraining recipe…

[1429]
31 May 2026. Score: 3.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the robustness of Tree of Reviews retrieval compare to chain-based retrieval for Llama-3-8B-128K when evaluated on adversarial or noisy versions of SQuAD using different embedding models. Dense retrieval…

[1428]
31 May 2026. Score: 3.33/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the impact of varying embedding dimensionality (e.g., 384, 768, 1024) on retrieval-augmented generation (RAG) performance for Llama-3-8B-128K on SQuAD when using Tree of Reviews versus.…

[1427]
31 May 2026. Score: 5.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the impact of graph-augmented attention mechanisms on inference latency and throughput for large-scale multimodal information extraction tasks relative to standard Vision-Language Models. While multimodal…

[1426]
31 May 2026. Score: 1.67/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: Do structural graph priors improve the robustness of zero-shot multimodal reasoning against adversarial text perturbations in evaluation suites like MM-Vet compared to dependency-free architectures. We propose…

[1425]
31 May 2026. Score: 5.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does integrating structural graph priors into multimodal transformers affect zero-shot extraction accuracy on noisy image-text benchmarks like NoisyVisDial compared to pure attention baselines. Deep neural…

[1424]
31 May 2026. Score: 6.70/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How do multimodal grounding models perform in disambiguating long-horizon navigation instructions in the Matterport3D benchmark when compared to LongNav-R1's interactive learning framework, measured. This paper…

[1423]
31 May 2026. Score: 5.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the inference latency of deep residual architectures compare to transformer-based models in zero-shot image classification on ImageNet. The remarkable success of Vision Transformers in Artificial Neural…

« Prev 1 142 143 144 145 146 201 Next »