Assignee Research: Index of Papers

Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 8261 papers; mean review score 5.72/10; 2243 Zenodo DOIs. Verified contributions (Gate 2: formal proof or sandbox reproduction): 145. 78 claims falsified by the pipeline (see falsification record). 169 published AI claims under field audit; 100 contested by the literature itself (see audit ledger). 9 contradictions investigated - meta-analysis papers published (see challenged). What does this mean?

Results 7201–7225 of 8261 entries

Papers

[1061]

Cross-Model Robustness Metrics in Qwen3-235B and Llama2-70B Under Adversarial Code Generation Attacks

31 May 2026. Score: 8.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20469612

Abstract: This report synthesises findings from 1 peer-reviewed paper addressing the following research question: How do cross-model robustness metrics vary for Qwen3-235B versus Llama2-70B when subjected to adversarial attacks on code generation tasks. The emergence of Transformer-based Large Language Models (LLMs) has…

[1060]

Distributionally Robust Optimization Enhances Metric Alignment in Vision-Language Segmentation

31 May 2026. Score: 4.93/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: To what extent do distributionally robust optimization techniques improve the alignment of Dice score and Hausdorff distance metrics with human evaluation in vision-language segmentation models. A joint…

[1059]

Multimodal Context Scaling and DeepSeek-R1 Robustness in Iterative Code Repair

31 May 2026. Score: 4.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the scaling of multimodal context (varying the ratio of text to diagram information) affect the robustness of DeepSeek-R1's iterative code repair performance across different programming. Code repair is…

[1058]

Synthetic Segmentation Metrics and Human Rater Agreement in Vision-Language Medical Imaging Models

31 May 2026. Score: 3.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the correlation between synthetic segmentation metrics and human rater agreement differ when replacing pure visual encoders with vision-language models in multimodal medical image benchmarks. Determining…

[1057]

FP8 and INT4 Quantized Llama-3.1-70B Throughput and Accuracy on A100 and H100 GPUs

31 May 2026. Score: 8.40/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20469587

Abstract: This report synthesises findings from 3 peer-reviewed papers addressing the following research question: What is the throughput difference between FP8 and INT4 quantized Llama-3.1-70B on HumanEval when deployed on A100 vs. H100 GPUs, and is the accuracy degradation consistent across both hardware. Large language…

[1056]

INT4 Quantization Effects on Zero-Shot Code Generation in Llama-3.1 Variants

31 May 2026. Score: 3.33/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does INT4 quantization impact the zero-shot code generation performance of Llama-3.1-70B compared to smaller variants (e.g., 8B) on HumanEval, and does the trade-off scale with model size. Recent progress in…

[1055]

DeepSeek-R1, CodeLlama, and WizardCoder Robustness on Out-of-Distribution MLOps Tasks

31 May 2026. Score: 4.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How robust are the code adaptation capabilities of DeepSeek-R1, CodeLlama, and WizardCoder when evaluated on out-of-distribution MLOps tasks, and how do their performance metrics (e.g., pass@k,. This paper…

[1054]

DeepSeek-R1, CodeLlama, and WizardCoder Latency-Accuracy Trade-offs in Few-Shot Code Generation

31 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the inference latency of DeepSeek-R1 compare to CodeLlama and WizardCoder when performing few-shot code generation on HumanEval-V, and what is the accuracy trade-off at different latency. Large language…

[1053]

Dynamic Code Execution Traces and Static Analysis in LLaVul for Vulnerability Classification

31 May 2026. Score: 3.07/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the integration of dynamic code execution traces with static analysis visualizations in LLaVul impact its vulnerability classification accuracy on the Big-Vul dataset compared to static-only. Increasing…

[1052]

Instruction Tuning with Security-Specific Code Examples Enhances Llama3 Zero-Shot Vulnerability Detection

31 May 2026. Score: 3.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does instruction tuning with security-specific code examples affect Llama3's zero-shot vulnerability detection accuracy on Big-Vul compared to general code instruction tuning. One of the most impressive…

[1051]

Codestral Model Size and Inference Latency in Big-Vul Severity Classification

31 May 2026. Score: 3.00/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What is the correlation between model size and inference latency for Codestral when performing severity-level classification on C and C++ code in Big-Vul. Context: Traditional software security analysis methods…

[1050]

Codestral Parameter Scaling and Robustness to Obfuscated Code in Vulnerability Detection

31 May 2026. Score: 4.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: Does increasing Codestral's parameter count improve robustness against obfuscated code variants in vulnerability detection benchmarks compared to smaller variants. As large language models (LLMs) are increasingly…

[1049]

Scaling Codestral from 7B to 33B Parameters Reduces False Positives in Vulnerability Detection

31 May 2026. Score: 7.07/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does scaling Codestral from 7B to 33B parameters affect false positive rates in vulnerability detection across the Big-Vul dataset. Software vulnerabilities can cause numerous problems, including crashes,…

[1048]

Retrieval-Augmented Prompting vs. Fine-Tuning Performance Scaling in DeepSeek-V3 for Cross-Language Vulnerability Detection

31 May 2026. Score: 5.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the performance gap between retrieval-augmented prompting and fine-tuning scale when evaluating DeepSeek-V3 on cross-language vulnerability datasets beyond the C/C++ focus of Big-Vul. As Large Language…

[1047]

31 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20469489

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: To what extent does the semantic similarity metric used for retrieving few-shot examples impact the false positive rate of DeepSeek-V3 on the Big-Vul benchmark compared to random example selection. Deep…

[1046]

DeepSeek-V3 Loss-Free Balancing and Performance Stability Across Programming Languages

31 May 2026. Score: 7.20/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: Does the Loss-Free Balancing strategy in DeepSeek-V3 maintain consistent performance stability across different programming languages in the GPQA Diamond domain when evaluated using the MBPP benchmark. We present…

[1045]

DeepSeek R1 Token Reasoning and Latency Trade-offs in Big-Vul Classification

31 May 2026. Score: 4.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What is the trade-off between model size and inference latency for Llama3, Codestral, and Deepseek R1 when classifying software vulnerabilities in the Big-Vul dataset. This study investigates the performance of…

[1044]

Cross-Language Generalization of Llama3, Codestral, and DeepSeek-R1 in Vulnerability Detection

31 May 2026. Score: 7.40/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How do Llama3, Codestral, and Deepseek R1 compare in cross-language generalization for vulnerability detection when fine-tuned on a subset of Big-Vul and evaluated on unseen programming languages. Large Language…

[1043]

DeepSeek-V3 Scaling and Consistency on Out-of-Distribution Reasoning Benchmarks

30 May 2026. Score: 1.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What is the correlation between model scaling and consistency metrics for DeepSeek-V3 when evaluated on out-of-distribution reasoning benchmarks. Recently, there is a high demand for deploying DeepSeek-R1 and V3…

[1042]

Scaling Dataset Size Effects on Pass@1 Accuracy in Romanized Nepali Fine-Tuning

30 May 2026. Score: 3.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the impact of varying dataset sizes (e.g., 1K, 5K, 10K samples) on the pass@1 accuracy of fine-tuned Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B for Romanized Nepali tasks, and how does this. Romanized…

[1041]

DeepSeek-V3 Parameter Scaling and Accuracy Variance on GPQA Diamond Under Distribution Shifts

30 May 2026. Score: 3.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does increasing parameter count from 7B to 33B in DeepSeek-V3 affect accuracy variance on GPQA Diamond under synthetic distribution shifts. In electronic trading markets, limit order books (LOBs) provide…

[1040]

Fine-Tuned Llama-3.1-8B, Mistral-7B, and Qwen3-8B Generalization in Romanized Low-Resource Languages

30 May 2026. Score: 5.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the performance of fine-tuned Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B on Romanized Nepali generalize to other low-resource language variants (e.g., Romanized Hindi or Marathi) when. Romanized Nepali,…

[1039]

Taxonomy-Aligned Fine-Tuning of Codestral for Zero-Shot Vulnerability Repair

30 May 2026. Score: 1.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does fine-tuning Codestral on taxonomy-aligned vulnerability datasets affect zero-shot repair success rates on Big-Vul compared to fine-tuning on general code corpora. Within the realm of software…

[1038]

Dataset Alignment Effects on Codestral False Positive Rates in SWCC Vulnerability Severity Prediction

30 May 2026. Score: 3.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the impact of dataset alignment on the false positive rate of Codestral when evaluating vulnerability severity predictions on the SWCC benchmark. Static Application Security Testing (SAST) tools play a…

[1037]

Vendi-RAG Diversity Optimization Enhances FLAN-T5-xl Robustness to Syntactic Distractors

30 May 2026. Score: 3.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: Does optimizing the diversity-weight in Vendi-RAG improve FLAN-T5-xl robustness against syntactic distractors in HANS compared to standard relevance-based RAG baselines. Retrieval-augmented generation (RAG)…

« Prev 1 … 287 288 289 290 291 … 331 Next »