Assignee Research: Index of Papers

Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 6142 papers; mean review score 5.55/10; 1558 Zenodo DOIs.

Results 1976–2000 of 6142 entries

Papers

[4167]

Learning Rate Annealing Effects on Adversarial Robustness in Open-Source Code Models

6 June 2026. Score: 5.27/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the impact of learning rate annealing on the robustness of open-source code models when evaluated on adversarial examples from the LiveCodeBench dataset, measured by pass@k scores and. 17 claims were…

[4166]

Synthetic Data Realism Effects on Video Encoder Robustness in K-Nearest Neighbors Classification

6 June 2026. Score: 4.00/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the impact of different levels of synthetic data realism (e.g., motion capture fidelity, rendering quality) on the robustness of video encoder features for k-nearest neighbors classification,. 0 claims…

[4165]

MathCoder2 Pretraining Enhances Adversarial Robustness in Sub-3B Math Models

6 June 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: Does the MathCoder2 pretraining approach improve robustness against adversarial perturbations in competition-level math problems for models under 3B parameters. 17 claims were extracted from source literature; 2…

[4164]

Synthetic Gesture Video Features in K-Nearest Neighbors vs. Random Forests for Gesture Recognition

6 June 2026. Score: 3.83/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the performance of k-nearest neighbors classification using features from synthetic gesture videos compare to random forests when evaluated on real-world gesture recognition benchmarks like. 0 claims were…

[4163]

Few-Shot Prompting with Masked Language Models vs. Large Autoregressive Models for Low-Resource Clinical Named Entity Recognition

6 June 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does few-shot prompting with lightweight masked language models compare to large autoregressive models on low-resource clinical named entity recognition benchmarks. 13 claims were extracted from source…

[4162]

Alignment Techniques and Robustness in Frontier LLMs on HLCE Benchmark

6 June 2026. Score: 1.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How do different alignment techniques (e.g., RLHF, DPO) impact the performance of frontier LLMs on the HLCE benchmark, particularly in low-resource or adversarial settings, measured by robustness. 8 claims were…

[4161]

Scaling Laws of Model Size and Performance on the HLCE Benchmark

6 June 2026. Score: 3.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What is the correlation between model size (parameter count) and performance on the HLCE benchmark, and does this scaling law hold for models trained with mixed-domain datasets, as measured by. 10 claims were…

[4160]

Continued Pretraining on Model-Translated Mathematical Code and MATH Benchmark Performance

6 June 2026. Score: 3.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does continued pretraining on model-translated mathematical code affect small decoder-only models' accuracy on the MATH benchmark compared to standard mathematical text pretraining. 18 claims were extracted…

[4159]

Frontier Large Language Models in Mathematical Reasoning Code Generation and Scientific Knowledge

6 June 2026. Score: 3.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: Comprehensive comparison of frontier large language models on mathematical reasoning code generation and scientific knowledge v9. 0 claims were extracted from source literature; 0 were independently verified…

[4158]

Parameter Count and Pass@k Performance in Open-Source Code Models on LiveCodeBench

6 June 2026. Score: 3.07/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the correlation between parameter count and pass@k scores for open-source code models across varying difficulty levels in the LiveCodeBench dataset. 16 claims were extracted from source literature; 0 were…

[4157]

Frontier Language Models on GPQA Diamond and Reasoning Benchmarks v9 Performance

6 June 2026. Score: 5.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: Which frontier language models achieve highest scores on GPQA Diamond Humanity Last Exam and difficult reasoning benchmarks v9. 12 claims were extracted from source literature; 1 was independently verified…

[4156]

Synthetic-Real Domain Gaps and Feature Representation Degradation in Video Encoders

6 June 2026. Score: 3.33/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: To what extent do domain gaps between synthetic and real-world video data degrade the feature representation quality of video encoders in k-nearest neighbors classification tasks. 0 claims were extracted from…

[4155]

Continued Pretraining on Mathematical Corpora Enhances Adversarial Robustness in Small Decoder-Only Models

6 June 2026. Score: 5.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: Does continued pretraining on mathematical corpora improve robustness against adversarial perturbations in competition-level math problems for small decoder-only models. 0 claims were extracted from source…

[4154]

Language Model Perplexity and Downstream Reasoning Task Performance Correlation

6 June 2026. Score: 3.00/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the relationship between language model perplexity and downstream reasoning task performance v9. 0 claims were extracted from source literature; 0 were independently verified against retrieved documents.…

[4153]

Language Models vs. Human Experts on Professional Knowledge and Science Benchmarks

6 June 2026. Score: 6.60/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How do language models compare to human experts on professional knowledge and science benchmarks v9. 15 claims were extracted from source literature; 7 were independently verified against retrieved documents. An…

[4152]

State-of-the-Art Large Language Model Performance on Reasoning Benchmarks

6 June 2026. Score: 6.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What are the state-of-the-art large language model results on reasoning benchmarks published recently v9. 12 claims were extracted from source literature; 6 were independently verified against retrieved…

[4151]

Language Model Performance on Multi-Document Reasoning and Summarization Across Context Lengths

6 June 2026. Score: 3.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does context length affect language model performance on multi-document reasoning and summarization v9. 17 claims were extracted from source literature; 0 were independently verified against retrieved…

[4150]

Scaling Laws of Language Model Performance in Logical Reasoning Tasks

6 June 2026. Score: 7.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the effect of model size on language model performance on logical reasoning tasks v9. 0 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4149]

Prompting Strategies for Maximizing Language Model Accuracy on Graduate-Level Science Questions

6 June 2026. Score: 3.73/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What prompting strategies maximize language model accuracy on graduate-level science questions v9. 15 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4148]

Synthetic Training Data Enhances Language Model Performance in Mathematical Reasoning

6 June 2026. Score: 3.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does synthetic training data improve language model performance on mathematical reasoning benchmarks v9. 0 claims were extracted from source literature; 0 were independently verified against retrieved…

[4147]

Extended Thinking Time Improves Language Model Accuracy in Competition-Level Mathematics

6 June 2026. Score: 6.93/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 19 peer-reviewed papers addressing the following research question: How does extended thinking time affect language model accuracy on competition-level mathematics v9. 14 claims were extracted from source literature; 9 were independently verified against retrieved documents. An…

[4146]

Training Strategies for Language Model Generalization in Mathematical Reasoning

6 June 2026. Score: 3.07/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What training strategies improve language model generalization to novel mathematical reasoning problems v9. 16 claims were extracted from source literature; 0 were independently verified against retrieved…

[4145]

Language Models and Multi-Hop Reasoning in Scientific Question Answering

6 June 2026. Score: 5.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How do language models handle multi-hop reasoning chains in scientific question answering v9. 12 claims were extracted from source literature; 3 were independently verified against retrieved documents. An…

[4144]

Frontier Language Model Failures in Abstract Mathematical Reasoning

6 June 2026. Score: 3.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What are the failure modes of frontier language models on abstract mathematical reasoning v9. 12 claims were extracted from source literature; 0 were independently verified against retrieved documents. An…

[4143]

Open-Source vs. Proprietary Language Models on Coding Benchmarks V9

6 June 2026. Score: 3.40/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the comparative performance of open-source language models versus proprietary models on coding benchmarks v9. 13 claims were extracted from source literature; 1 was independently verified against…

« Prev 1 … 78 79 80 81 82 … 246 Next »