Assignee Research: Index of Papers

Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 4351 papers; mean review score 5.87/10; 1389 Zenodo DOIs.

Results 151–175 of 4351 entries

Papers

[4201]

Few-Shot Prompting in Masked vs. Autoregressive Models for Cross-Lingual Named Entity Recognition

6 June 2026. Score: 4.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does few-shot prompting performance of masked language models compare to autoregressive models on cross-lingual named entity recognition benchmarks for low-resource languages. 6 claims were extracted from…

[4200]

Frontier Language Models Performance on GPQA Diamond and Reasoning Benchmarks

6 June 2026. Score: 3.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: Which frontier language models achieve highest scores on GPQA Diamond Humanity Last Exam and difficult reasoning benchmarks v10. 0 claims were extracted from source literature; 0 were independently verified…

[4199]

THaMES Evaluation Pipelines and Multimodal Model Robustness to Factual Caption Errors

6 June 2026. Score: 3.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What is the impact of THaMES evaluation pipelines on the robustness of multimodal models against factually incorrect captions in the ScienceQA dataset. 5 claims were extracted from source literature; 0 were…

[4198]

Frontier Large Language Models in Mathematical Reasoning and Scientific Knowledge Benchmarks

6 June 2026. Score: 3.83/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 5 peer-reviewed papers addressing the following research question: Comprehensive comparison of frontier large language models on mathematical reasoning code generation and scientific knowledge v10. 0 claims were extracted from source literature; 0 were independently verified…

[4197]

State-of-the-Art Large Language Model Performance on Reasoning Benchmarks

6 June 2026. Score: 3.83/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the state-of-the-art large language model results on reasoning benchmarks published recently v10. 0 claims were extracted from source literature; 0 were independently verified against retrieved…

[4196]

Language Models and Human Experts on Professional Knowledge Benchmarks

6 June 2026. Score: 3.00/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How do language models compare to human experts on professional knowledge and science benchmarks v10. 0 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4195]

Scaling Laws and Logical Reasoning in DeepSeek-V3 with MoE and MLA Architectures

6 June 2026. Score: 4.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: What is the effect of model size on language model performance on logical reasoning tasks v10. 14 claims were extracted from source literature; 2 were independently verified against retrieved documents. An…

[4194]

Prompting Strategies for Maximizing Language Model Accuracy on Graduate-Level Science Questions

6 June 2026. Score: 3.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What prompting strategies maximize language model accuracy on graduate-level science questions v10. 13 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4193]

Extended Thinking Time Improves Language Model Accuracy in Competition-Level Mathematics

6 June 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does extended thinking time affect language model accuracy on competition-level mathematics v10. 12 claims were extracted from source literature; 2 were independently verified against retrieved documents. An…

[4192]

Synthetic Training Data Enhancements in Language Model Mathematical Reasoning

6 June 2026. Score: 3.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does synthetic training data improve language model performance on mathematical reasoning benchmarks v10. 0 claims were extracted from source literature; 0 were independently verified against retrieved…

[4191]

Quantization Impact on Reasoning Capabilities in Large Language Models

6 June 2026. Score: 3.67/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does model quantization affect reasoning capability in large language models v10. 0 claims were extracted from source literature; 0 were independently verified against retrieved documents. An automated…

[4190]

Language Models in Multi-Hop Scientific Reasoning: A Systematic Synthesis

6 June 2026. Score: 3.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How do language models handle multi-hop reasoning chains in scientific question answering v10. 11 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4189]

Instruction Fine-Tuning Boosts Language Model Mathematical Problem-Solving Accuracy

6 June 2026. Score: 6.10/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the effect of instruction fine-tuning on language model mathematical problem-solving accuracy v10. 9 claims were extracted from source literature; 4 were independently verified against retrieved…

[4188]

Training Strategies for Language Model Generalization in Mathematical Reasoning

6 June 2026. Score: 3.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What training strategies improve language model generalization to novel mathematical reasoning problems v10. 5 claims were extracted from source literature; 0 were independently verified against retrieved…

[4187]

Open-Source vs. Proprietary Language Models on Coding Benchmarks V10

6 June 2026. Score: 4.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the comparative performance of open-source language models versus proprietary models on coding benchmarks v10. 10 claims were extracted from source literature; 2 were independently verified against…

[4186]

Frontier Language Model Failures in Abstract Mathematical Reasoning

6 June 2026. Score: 6.00/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What are the failure modes of frontier language models on abstract mathematical reasoning v10. 0 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4185]

Retrieval-Augmented Language Models in Knowledge-Intensive Task Performance

6 June 2026. Score: 3.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does retrieval augmentation improve language model performance on knowledge-intensive tasks v10. 8 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4184]

Multimodal Language Models in Visual Mathematical and Scientific Reasoning Benchmarks

6 June 2026. Score: 3.33/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How do multimodal language models perform on visual mathematical and scientific reasoning v10. 0 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4183]

Reinforcement Learning from Human Feedback Enhances Language Model Mathematical Reasoning

6 June 2026. Score: 3.83/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does reinforcement learning from human feedback improve language model mathematical reasoning v10. 0 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4182]

Language Models in Formal Theorem Proving and Mathematical Verification Tasks

6 June 2026. Score: 3.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How do language models perform on formal theorem proving and mathematical verification tasks v10. 0 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4181]

Pretraining Data Quality and Its Impact on Language Model Reasoning Performance

6 June 2026. Score: 3.40/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does pretraining data quality affect language model reasoning benchmark performance v10. 12 claims were extracted from source literature; 1 was independently verified against retrieved documents. An automated…

[4180]

Language Models for Competition-Level Software Engineering Problem Solving

6 June 2026. Score: 7.30/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What techniques enable language models to solve competition-level software engineering problems v10. 8 claims were extracted from source literature; 6 were independently verified against retrieved documents. An…

[4179]

Architectural Innovations Enhancing Transformer Performance in Multi-Step Logical Reasoning

6 June 2026. Score: 6.67/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What architectural innovations improve transformer performance on multi-step logical reasoning v10. 0 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4178]

Emergent Reasoning Capabilities in Transformers at Scale: A Multi-Study Synthesis

6 June 2026. Score: 4.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the relationship between model scale and emergent reasoning capabilities in transformers v10. 8 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4177]

Test-Time Compute Scaling and Adaptive Token-Level Reasoning in Language Models

6 June 2026. Score: 5.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does test-time compute scaling improve language model performance on reasoning benchmarks v10. 19 claims were extracted from source literature; 5 were independently verified against retrieved documents. An…

« Prev 1 … 5 6 7 8 9 … 175 Next »