Assignee Research: Index of Papers

Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 6335 papers; mean review score 5.54/10; 1581 Zenodo DOIs.

Results 2226–2250 of 6335 entries

Papers

[4110]

Vamba-10B Benchmark Performance Across Reasoning Mathematics Coding and Language Tasks

6 June 2026. Score: 5.83/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Vamba-10B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…

[4109]

VideoChat-Flash-7B Benchmark Performance Across Reasoning Mathematics Coding and Language Tasks

6 June 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of VideoChat-Flash-7B on reasoning mathematics coding and language understanding tasks. 12 claims were extracted from source literature; 1 was independently verified…

[4108]

LLaVA-Video-72B Benchmark Performance Across Reasoning Mathematics Coding and Language Tasks

6 June 2026. Score: 5.83/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of LLaVA-Video-72B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…

[4107]

LLaVA-v1.6-7B Benchmark Performance Across Reasoning Mathematics Coding and Language Tasks

6 June 2026. Score: 6.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of llava-v1.6-7b on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…

[4106]

InstructionBlip-7B Benchmark Performance Across Reasoning Mathematics Coding and Language Tasks

6 June 2026. Score: 6.67/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of InstructionBlip-7b on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified…

[4105]

LLaVA-v1.6-Mistral-7B Benchmark Performance Across Reasoning Mathematics Coding and Language Tasks

6 June 2026. Score: 4.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of llava-v1.6-mistral-7b on reasoning mathematics coding and language understanding tasks. 17 claims were extracted from source literature; 2 were independently verified…

[4104]

GPT-5-Mini Benchmark Performance Across Reasoning Mathematics Coding and Language Tasks

6 June 2026. Score: 3.83/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of GPT-5-mini on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…

[4103]

LLaVA-OneVision-72B Benchmark Performance Across Reasoning Mathematics Coding and Language Tasks

6 June 2026. Score: 6.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of LLaVA-OneVision-72B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified…

[4102]

Qwen-VL-2B Benchmark Performance Across Reasoning Mathematics Coding and Language Tasks

6 June 2026. Score: 4.57/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Qwen-VL-2B on reasoning mathematics coding and language understanding tasks. 12 claims were extracted from source literature; 1 was independently verified against…

[4101]

XGLM Model Scaling in Zero-Shot Cross-Lingual Educational Dialogue Act Classification

6 June 2026. Score: 6.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the performance of XGLM models (564M vs. 1.7B) compare in zero-shot cross-lingual transfer for educational dialogue act classification on under-resourced languages like Indonesian versus. 0 claims were…

[4100]

Fine-Tuning Mistral-7B on Musical Text Reduces Hallucinations in Long-Context RAG

6 June 2026. Score: 3.67/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does fine-tuning Mistral-7B on domain-specific musical text affect its hallucination rates compared to base models when evaluated on long-context RAG benchmarks. 0 claims were extracted from source…

[4099]

Dense vs. Sparse Retrieval Throughput in Phi-3-Mini Long-Context Generation

6 June 2026. Score: 3.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the throughput impact of dense versus sparse retrieval on Phi-3-mini's response generation time when evaluated on long-context benchmarks, measured in tokens per second. 12 claims were extracted from…

[4098]

Code-Based Self-Verification Effects on Phi-3-Mini and Mistral-7B GSM-Symbolic Accuracy Under Adversarial Perturbations

6 June 2026. Score: 2.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the accuracy of Phi-3-mini and Mistral-7B-v0.1 on GSM-Symbolic change when code-based self-verification is applied to adversarially perturbed instances across multiple languages. 12 claims were extracted…

[4097]

Hybrid Retrieval Reduces Hallucinations in Mistral-7B Across Legal and Scientific Domains

6 June 2026. Score: 4.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the hybrid retrieval approach perform in mitigating hallucinations in Mistral-7B when applied to domain-specific benchmarks beyond religious texts, such as legal or scientific corpora,. 10 claims were…

[4096]

Differentially Private LoRA Fine-Tuning and GSM8K Reasoning Accuracy in Mistral-7B

6 June 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does differentially private LoRA fine-tuning affect the GSM8K reasoning accuracy of Mistral-7B compared to full-model private SGD. 10 claims were extracted from source literature; 1 was independently verified…

[4095]

Factual Consistency Trade-offs in 7B vs 3.8B Models with RAG Retrieval

6 June 2026. Score: 5.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does reducing parameter count from 7B to 3.8B affect factual consistency scores on the HaluEval benchmark when using identical RAG retrieval contexts. 0 claims were extracted from source literature; 0 were…

[4094]

Adapter-Based Fine-Tuning and Adversarial Transferability Across Languages in PAWS-X

6 June 2026. Score: 3.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the impact of adapter-based fine-tuning on the transferability of adversarial examples across languages in the PAWS-X benchmark for XLM-R base models. 11 claims were extracted from source literature; 0…

[4093]

Differentially Private Adapters for Safe and Utility-Preserving NLP Alignment

6 June 2026. Score: 7.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: Can differentially private adapter methods maintain alignment safety scores on ToxicChat while preserving utility on standard NLP benchmarks. 0 claims were extracted from source literature; 0 were independently…

[4092]

Token-Level Precision in Long-Context Code Completion: Mistral 7B with Sliding Window vs. Standard Attention

6 June 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does the token-level precision of code completion differ between Mistral 7B with sliding window attention and standard attention mechanisms when processing inputs longer than 32k tokens on. 15 claims were…

[4091]

Kimi Delta Attention Zero-Shot Reasoning Accuracy on Long-Context Pile Subsets

6 June 2026. Score: 6.33/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 1 peer-reviewed paper addressing the following research question: Does Kimi Delta Attention maintain comparable zero-shot reasoning accuracy to full attention on long-context subsets of the Pile benchmark. 0 claims were extracted from source literature; 0 were independently…

[4090]

Gemini 1.5 Pro Performance Degradation on Qasper with Context Position Shifts

6 June 2026. Score: 8.07/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20569256

Abstract: This report synthesises findings from 1 peer-reviewed paper addressing the following research question: How does the performance of Gemini 1.5 Pro on the Qasper dataset degrade as the position of relevant information shifts from the beginning to the middle versus the end of a 500k token context window. 8 claims were…

[4089]

Tex-9K Texture Diversity and Zero-Shot Robustness in Multimodal Anomaly Detection

6 June 2026. Score: 6.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: To what extent does the diversity of the Tex-9K texture library improve the robustness of multimodal anomaly detection models against varying background textures and lighting conditions in zero-shot. 0 claims…

[4088]

AnomalyPainter vs. CLIP-Based Zero-Shot Detectors Under Industrial Lighting Shifts

6 June 2026. Score: 3.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: What is the comparative performance of AnomalyPainter's vision-language synergy against standard CLIP-based zero-shot detectors when evaluated on industrial benchmarks with domain-shifted lighting. 16 claims were…

[4087]

Visual Context Integration Enhances Human Alignment in Code Generation Benchmarks

6 June 2026. Score: 6.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: Does incorporating visual context in code training data improve alignment with human intent in code generation benchmarks. 0 claims were extracted from source literature; 0 were independently verified against…

[4086]

DiffCoT vs Traditional Chain-of-Thought in Mathematical Reasoning at Scale

6 June 2026. Score: 3.83/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the diffusion-styled CoT framework (DiffCoT) compare in mathematical reasoning accuracy to traditional CoT methods when scaled to different model sizes (e.g., 7B vs. 30B parameters), as. 0 claims were…

« Prev 1 … 88 89 90 91 92 … 254 Next »