Assignee Research: Index of Papers

Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 5765 papers; mean review score 5.63/10; 1553 Zenodo DOIs.

Results 1376–1400 of 5765 entries

Papers

[4390]

Taxonomy of AI Techniques for Solving Competition-Level Software Engineering Problems

7 June 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What techniques enable language models to solve competition-level software engineering problems v18. 14 claims were extracted from source literature; 1 was independently verified against retrieved documents. An…

[4389]

Sparse Mixture-of-Experts vs. Dense Transformers in Mathematical Reasoning Benchmarks

7 June 2026. Score: 3.40/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How do sparse mixture-of-experts models compare to dense transformers on mathematical reasoning v18. 16 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4388]

Language Models in Formal Theorem Proving and Mathematical Verification Tasks

7 June 2026. Score: 2.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How do language models perform on formal theorem proving and mathematical verification tasks v18. 10 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4387]

Reinforcement Learning from Human Feedback Enhances Language Model Mathematical Reasoning

7 June 2026. Score: 1.67/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does reinforcement learning from human feedback improve language model mathematical reasoning v18. 0 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4386]

Architectural Innovations Enhancing Transformer Performance in Multi-Step Logical Reasoning

7 June 2026. Score: 4.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What architectural innovations improve transformer performance on multi-step logical reasoning v18. 15 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4385]

Emergent Reasoning in Transformers: Scaling Laws and Capability Thresholds

7 June 2026. Score: 3.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the relationship between model scale and emergent reasoning capabilities in transformers v18. 13 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4384]

Test-Time Compute Scaling and Language Model Performance on Reasoning Benchmarks

7 June 2026. Score: 3.67/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does test-time compute scaling improve language model performance on reasoning benchmarks v18. 0 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4383]

Scaling Laws of Chain-of-Thought Reasoning in Large Language Models

7 June 2026. Score: 4.23/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the scaling laws for chain-of-thought reasoning in large language models v18. 20 claims were extracted from source literature; 1 was independently verified against retrieved documents. An automated…

[4382]

Policy-Gradient Reinforcement Learning Outperforms PPO in Non-Ideal Scenario Robustness

7 June 2026. Score: 5.73/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: Do policy-gradient RL methods improve robustness scores on non-ideal scenario datasets relative to PPO-trained baseline models. 14 claims were extracted from source literature; 4 were independently verified…

[4381]

State-of-the-Art Large Language Model Performance on Reasoning Benchmarks

7 June 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the state-of-the-art large language model results on reasoning benchmarks published recently v17. 14 claims were extracted from source literature; 1 was independently verified against retrieved…

[4380]

Frontier Large Language Models in Mathematical Reasoning, Code Generation, and Scientific Knowledge

7 June 2026. Score: 3.73/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: Comprehensive comparison of frontier large language models on mathematical reasoning code generation and scientific knowledge v17. 7 claims were extracted from source literature; 0 were independently verified…

[4379]

Language Model Performance Across Varying Context Lengths in Multi-Document Reasoning and Summarization

7 June 2026. Score: 3.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does context length affect language model performance on multi-document reasoning and summarization v17. 10 claims were extracted from source literature; 0 were independently verified against retrieved…

[4378]

Frontier Language Models Leading GPQA Diamond and Reasoning Benchmark Performance

7 June 2026. Score: 5.93/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: Which frontier language models achieve highest scores on GPQA Diamond Humanity Last Exam and difficult reasoning benchmarks v17. 0 claims were extracted from source literature; 0 were independently verified…

[4377]

Perplexity and Downstream Reasoning Performance in Language Models

7 June 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What is the relationship between language model perplexity and downstream reasoning task performance v17. 12 claims were extracted from source literature; 1 was independently verified against retrieved documents.…

[4376]

Scaling Laws for Language Model Performance in Logical Reasoning Tasks

7 June 2026. Score: 3.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What is the effect of model size on language model performance on logical reasoning tasks v17. 0 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4375]

Language Models vs. Human Experts on Professional Knowledge Benchmarks

7 June 2026. Score: 3.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How do language models compare to human experts on professional knowledge and science benchmarks v17. 11 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4374]

Synthetic Training Data Enhances Language Model Performance in Mathematical Reasoning

7 June 2026. Score: 4.00/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does synthetic training data improve language model performance on mathematical reasoning benchmarks v17. 0 claims were extracted from source literature; 0 were independently verified against retrieved…

[4373]

Prompting Strategies for Maximizing Language Model Accuracy on Graduate-Level Science Questions

7 June 2026. Score: 3.33/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What prompting strategies maximize language model accuracy on graduate-level science questions v17. 0 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4372]

Extended Thinking Time Improves Language Model Accuracy in Competition-Level Mathematics

7 June 2026. Score: 3.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does extended thinking time affect language model accuracy on competition-level mathematics v17. 9 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4371]

Language Models and Multi-Hop Reasoning in Scientific Question Answering

7 June 2026. Score: 8.67/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20575687

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How do language models handle multi-hop reasoning chains in scientific question answering v17. 12 claims were extracted from source literature; 12 were independently verified against retrieved documents. An…

[4370]

Open-Source vs. Proprietary Language Models on Coding Benchmarks V17

7 June 2026. Score: 3.33/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What is the comparative performance of open-source language models versus proprietary models on coding benchmarks v17. 0 claims were extracted from source literature; 0 were independently verified against…

[4369]

Quantization Impact on Reasoning Performance in Reinforcement-Learned Large Language Models

7 June 2026. Score: 4.73/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does model quantization affect reasoning capability in large language models v17. 16 claims were extracted from source literature; 3 were independently verified against retrieved documents. An automated…

[4368]

Retrieval-Augmented Language Models for Knowledge-Intensive Task Performance

7 June 2026. Score: 5.07/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does retrieval augmentation improve language model performance on knowledge-intensive tasks v17. 11 claims were extracted from source literature; 3 were independently verified against retrieved documents. An…

[4367]

Training Strategies for Language Model Generalization in Mathematical Reasoning

7 June 2026. Score: 4.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What training strategies improve language model generalization to novel mathematical reasoning problems v17. 15 claims were extracted from source literature; 1 was independently verified against retrieved…

[4366]

Multimodal Language Models in Visual Mathematical and Scientific Reasoning

7 June 2026. Score: 6.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How do multimodal language models perform on visual mathematical and scientific reasoning v17. 0 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

« Prev 1 … 54 55 56 57 58 … 231 Next »