Assignee Research: Index of Papers

Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 5545 papers; mean review score 5.64/10; 1499 Zenodo DOIs.

Results 1176–1200 of 5545 entries

Papers

[4370]

Open-Source vs. Proprietary Language Models on Coding Benchmarks V17

7 June 2026. Score: 3.33/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What is the comparative performance of open-source language models versus proprietary models on coding benchmarks v17. 0 claims were extracted from source literature; 0 were independently verified against…

[4369]

Quantization Impact on Reasoning Performance in Reinforcement-Learned Large Language Models

7 June 2026. Score: 4.73/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does model quantization affect reasoning capability in large language models v17. 16 claims were extracted from source literature; 3 were independently verified against retrieved documents. An automated…

[4368]

Retrieval-Augmented Language Models for Knowledge-Intensive Task Performance

7 June 2026. Score: 5.07/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does retrieval augmentation improve language model performance on knowledge-intensive tasks v17. 11 claims were extracted from source literature; 3 were independently verified against retrieved documents. An…

[4367]

Training Strategies for Language Model Generalization in Mathematical Reasoning

7 June 2026. Score: 4.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What training strategies improve language model generalization to novel mathematical reasoning problems v17. 15 claims were extracted from source literature; 1 was independently verified against retrieved…

[4366]

Multimodal Language Models in Visual Mathematical and Scientific Reasoning

7 June 2026. Score: 6.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How do multimodal language models perform on visual mathematical and scientific reasoning v17. 0 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4365]

Parameter Scale and Algorithmic Reasoning Success in LLM-ProS Benchmark

7 June 2026. Score: 6.70/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the correlation between model parameter scale and success rates on algorithmic reasoning tasks in the LLM-ProS dataset. 14 claims were extracted from source literature; 6 were independently verified…

[4364]

Chain-of-Thought Prompting Improves Large Language Model Accuracy on ICPC Problems

7 June 2026. Score: 5.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does chain-of-thought prompting impact the accuracy of large language models on ICPC World Finals problems compared to direct code generation. 7 claims were extracted from source literature; 2 were…

[4363]

Procedural Pretraining Effects on Model Alignment and Benchmark Performance

7 June 2026. Score: 5.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does pretraining on procedural data influence alignment metrics like toxicity and helpfulness in models evaluated on benchmarks like TruthfulQA and HELM. 7 claims were extracted from source literature; 1 was…

[4362]

Frontier Language Model Failures in Abstract Mathematical Reasoning

6 June 2026. Score: 3.33/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What are the failure modes of frontier language models on abstract mathematical reasoning v17. 16 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4361]

Techniques for Solving Competition-Level Software Engineering Problems with Language Models

6 June 2026. Score: 3.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What techniques enable language models to solve competition-level software engineering problems v17. 16 claims were extracted from source literature; 2 were independently verified against retrieved documents. An…

[4360]

Pretraining Data Quality and Its Impact on Language Model Reasoning Performance

6 June 2026. Score: 4.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does pretraining data quality affect language model reasoning benchmark performance v17. 15 claims were extracted from source literature; 3 were independently verified against retrieved documents. An…

[4359]

Instruction Fine-Tuning Improves Language Model Mathematical Problem-Solving Accuracy

6 June 2026. Score: 6.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the effect of instruction fine-tuning on language model mathematical problem-solving accuracy v17. 0 claims were extracted from source literature; 0 were independently verified against retrieved…

[4358]

Reinforcement Learning from Human Feedback Enhances Language Model Mathematical Reasoning

6 June 2026. Score: 4.33/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does reinforcement learning from human feedback improve language model mathematical reasoning v17. 0 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4357]

Language Models in Formal Theorem Proving and Mathematical Verification Tasks

6 June 2026. Score: 7.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 20 peer-reviewed papers addressing the following research question: How do language models perform on formal theorem proving and mathematical verification tasks v17. 15 claims were extracted from source literature; 9 were independently verified against retrieved documents. An…

[4356]

Scaling Laws of Chain-of-Thought Reasoning in Large Language Models

6 June 2026. Score: 3.67/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the scaling laws for chain-of-thought reasoning in large language models v17. 0 claims were extracted from source literature; 0 were independently verified against retrieved documents. An automated…

[4355]

Emergent Reasoning Capabilities in Transformers at Varying Model Scales

6 June 2026. Score: 3.73/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the relationship between model scale and emergent reasoning capabilities in transformers v17. 13 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4354]

Test-Time Compute Scaling and Its Impact on Language Model Reasoning Performance

6 June 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does test-time compute scaling improve language model performance on reasoning benchmarks v17. 13 claims were extracted from source literature; 1 was independently verified against retrieved documents. An…

[4353]

Sparse Mixture-of-Experts vs. Dense Transformers in Mathematical Reasoning Benchmarks

6 June 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How do sparse mixture-of-experts models compare to dense transformers on mathematical reasoning v17. 18 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4352]

EXAONE-3.5 Benchmark Performance Across Reasoning Mathematics Coding and Language Tasks

6 June 2026. Score: 3.67/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of EXAONE-3.5 on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…

[4351]

DeepSeek-7B Benchmark Performance Across Reasoning Mathematics Coding and Language Tasks

6 June 2026. Score: 3.67/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of DeepSeek-7B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…

[4350]

DeepSeek-14B Benchmark Performance Across Reasoning Mathematics and Language Tasks

6 June 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of DeepSeek-14B on reasoning mathematics coding and language understanding tasks. 20 claims were extracted from source literature; 2 were independently verified against…

[4349]

Claude-3-Haiku Benchmark Performance Across Reasoning Mathematics Coding and Language Tasks

6 June 2026. Score: 6.00/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Claude-3-Haiku on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…

[4348]

DeepSeek-VL Benchmark Performance Across Reasoning Mathematics Coding and Language Tasks

6 June 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Deepseek-VL on reasoning mathematics coding and language understanding tasks. 12 claims were extracted from source literature; 2 were independently verified against…

[4347]

State-of-the-Art Large Language Model Performance on Reasoning Benchmarks

6 June 2026. Score: 6.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What are the state-of-the-art large language model results on reasoning benchmarks published recently v16. 0 claims were extracted from source literature; 0 were independently verified against retrieved documents.…

[4346]

Language Models vs. Human Experts on Professional Knowledge and Science Benchmarks

6 June 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How do language models compare to human experts on professional knowledge and science benchmarks v16. 14 claims were extracted from source literature; 1 was independently verified against retrieved documents. An…

« Prev 1 … 46 47 48 49 50 … 222 Next »