Assignee Research: Index of Papers

Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 5998 papers; mean review score 5.58/10; 1557 Zenodo DOIs.

Results 2601–2625 of 5998 entries

Papers

[3398]

Throughput and Token Generation Speed Trade-offs in Answer-Then-Check vs RLHF Alignment

4 June 2026. Score: 4.33/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How do Answer-Then-Check and RLHF alignment methods differ in terms of throughput degradation and token generation speed on Tulu 3 and Deepseek R1 during adversarial prompting. 12 claims were extracted from…

[3397]

Retrieval-Augmented Generation Effects on Vision-Language Model Robustness Against Multimodal Jailbreaks

4 June 2026. Score: 5.83/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does Retrieval-Augmented Generation impact the robustness of vision-language models against multimodal jailbreak attacks on benchmarks like MM-SafetyBench compared to non-RAG baselines. 0 claims were…

[3396]

DeepSeek R1 and Tulu 3 Memory and Latency in Multi-Turn Code Reasoning on RTX 4090

4 June 2026. Score: 3.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the memory footprint and latency of Deepseek R1 compare to Tulu 3 during multi-turn code reasoning tasks on RTX 4090 hardware. 9 claims were extracted from source literature; 0 were independently…

[3395]

DeepSeek R1 and Tulu 3 Pass@1 Accuracy on HumanEval Under Single-GPU Constraints

4 June 2026. Score: 5.83/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: What is the difference in pass@1 accuracy between Deepseek R1 and Tulu 3 on the HumanEval benchmark when restricted to single consumer-grade GPU inference. 0 claims were extracted from source literature; 0 were…

[3394]

Tulu 3 and Mistral 7B Refusal Accuracy on HarmBench Safety-Critical Prompts

4 June 2026. Score: 3.83/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the difference in refusal accuracy between Tulu 3 and Mistral 7B on safety-critical prompts defined in the HarmBench framework. 0 claims were extracted from source literature; 0 were independently…

[3393]

Tulu 3 Latency Scaling Across Context Lengths in Complex Reasoning Tasks

4 June 2026. Score: 3.07/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the latency scaling curve of Tulu 3 vary across different context lengths during complex reasoning tasks compared to base Llama 3.1 models. 9 claims were extracted from source literature; 0 were…

[3392]

Robustness of Llama3.1-8B and Mistral 7B Against Powertrain Adversarial Perturbations

4 June 2026. Score: 3.40/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the robustness of Llama3.1-8B against domain-specific adversarial perturbations in powertrain data compare to Mistral 7B across varying context window sizes. 8 claims were extracted from source…

[3391]

Synthetic Training Diversity Enhances DeepSeek Coder Robustness to Code Perturbations

4 June 2026. Score: 3.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: Does increasing the diversity of synthetic training samples improve DeepSeek Coder's resistance to semantic-preserving code perturbations as measured by accuracy drop on the MBPP benchmark. 14 claims were…

[3390]

Training Data Diversity and Cross-Language Generalization in Code Llama for Memory Corruption Detection

4 June 2026. Score: 5.83/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the impact of training data diversity on the cross-language generalization capability of Code Llama for detecting memory corruption vulnerabilities in C++ and Rust. 0 claims were extracted from source…

[3389]

Code Llama Zero-Shot Vulnerability Detection Accuracy on Big-Vul with Synthetic and Real-World Fine-Tuning

4 June 2026. Score: 4.30/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the zero-shot vulnerability detection accuracy of Code Llama on the Big-Vul dataset compare when fine-tuned on synthetic CVE data versus real-world exploit repositories. 12 claims were extracted from…

[3388]

Robustness of Code Llama Models Trained on Synthetic vs. Standard Vulnerability Datasets

4 June 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the robustness of Code Llama models trained on synthetic code vulnerability augmented datasets compare to those trained on standard Big-Vul subsets when evaluated on adversarial code. 16 claims were…

[3387]

Synthetic-Vulnerable Code Ratios and Their Effect on Code Llama's Benchmark Performance

4 June 2026. Score: 3.33/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: What is the impact of varying the ratio of synthetic-vulnerable to standard code samples in the training data on Code Llama's performance on HumanEval and MBPP code generation benchmarks, measured by. 15 claims…

[3386]

Retrieval-Augmented Generation Enhances Mistral 7B Robustness in Adversarial Time-Series Tasks

4 June 2026. Score: 5.83/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: To what extent does retrieval-augmented generation improve the robustness of Mistral 7B against adversarial perturbations in time-series data compared to instruction-tuned variants. 0 claims were extracted from…

[3385]

Retrieval-Augmented Generation Impact on False Positives in Malicious Python Code Detection

4 June 2026. Score: 4.33/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does Retrieval-Augmented Generation affect the false positive rates of Llama 3.1 compared to Mistral 7B when classifying obfuscated Python code in malicious package detection. 0 claims were extracted from…

[3384]

Llama3 and DeepSeek R1 F1-Score Comparison on Big-Vul Under Obfuscation Generalization

4 June 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the F1-score of Llama3 compare to Deepseek R1 on the Big-Vul dataset when evaluating generalization to unseen obfuscation techniques after fine-tuning on adversarial samples. 14 claims were extracted…

[3383]

Mistral 7B and Llama 3.1 Inference Performance in Multi-Agent RAG for PyPI Security Analysis

4 June 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: What is the difference in inference latency and token throughput between Llama 3.1 and Mistral 7B when integrated into a multi-agent system for analyzing PyPI package security with RAG. 15 claims were extracted…

[3382]

Retrieval-Augmented Generation and Chain-of-Thought Prompting Latency in Open-Weight LLMs for Time-Series Data

4 June 2026. Score: 5.33/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What is the impact of retrieval-augmented generation versus chain-of-thought prompting on the inference latency and token throughput of open-weight LLMs when processing high-frequency time-series data. 0 claims…

[3381]

Retrieval-Augmented Generation and Chain-of-Thought Prompting Robustness to Distribution Shifts in Zero-Shot Fault Detection

4 June 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How do retrieval-augmented generation and chain-of-thought prompting differ in their robustness to distribution shifts when applied to zero-shot fault detection in cyber-physical systems using. 16 claims were…

[3380]

Fairness Metrics and Communication Compression in Federated Code Generation Fine-Tuning

4 June 2026. Score: 3.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What is the correlation between communication compression ratios and fairness metrics in federated fine-tuning of code generation models. 15 claims were extracted from source literature; 0 were independently…

[3379]

Multimodal Contrastive Learning for Robust Vulnerability Detection in Adversarial Code-Comment Pairs

4 June 2026. Score: 3.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: To what extent does multimodal contrastive learning (as proposed in MultiVul) improve the robustness of vulnerability detection models like Llama3 and Deepseek R1 when tested on adversarially. 11 claims were…

[3378]

Llama 3.1 and Mistral 7B Robustness Under Distribution Shifts in Energy Forecasting

4 June 2026. Score: 5.43/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How do Llama 3.1 and Mistral 7B differ in robustness scores against distribution shifts when transferring from synthetic battery datasets to real-world renewable energy forecasting tasks. 17 claims were extracted…

[3377]

Llama3 and DeepSeek R1 Inference Efficiency for Vulnerability Detection Under Adversarial Noise

4 June 2026. Score: 3.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the inference efficiency of Llama3 compare to Deepseek R1 for vulnerability detection on the Big-Vul dataset when evaluated under varying levels of adversarial noise, measured in terms of. 0 claims were…

[3376]

Scalability and Accuracy Trade-offs in Llama3 and DeepSeek-R1 for Vulnerability Detection

4 June 2026. Score: 5.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How scalable are Llama3 and Deepseek R1 in detecting vulnerabilities in the Big-Vul dataset when trained with MultiVul's multimodal approach, as measured by the throughput (samples per second) and. 11 claims were…

[3375]

Llama 3.1 and Mistral 7B Throughput Degradation in Chain-of-Thought Reasoning on Energy Grid Data

4 June 2026. Score: 5.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the relative inference throughput degradation of Llama 3.1 versus Mistral 7B when performing chain-of-thought reasoning on structured energy grid data. 0 claims were extracted from source literature; 0…

[3374]

Cross-Domain Transferability of Llama3.1 and Mistral 7B with RAG in Energy Forecasting and Anomaly Detection

4 June 2026. Score: 3.33/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does cross-domain transferability affect the performance of Llama3.1 and Mistral 7B with RAG when fine-tuned on battery datasets and applied to renewable energy forecasting versus power grid. 11 claims were…

« Prev 1 … 103 104 105 106 107 … 240 Next »