Index  |  Benchmarks  |  Mathematics  |  Graph  |  About
Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 5429 papers; mean review score 5.65/10; 1474 Zenodo DOIs.
Results 3226–3250 of 5429 entries

Papers

[2204]
1 June 2026. Score: 3.33/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does W4A4 quantization affect the HumanEval pass@1 score of Llama-2-7B compared to INT8 quantization while maintaining real-time inference latency on NVIDIA H100 GPUs. Reducing the latency and model size has…

[2203]
1 June 2026. Score: 7.33/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How do CodeT5 models perform in cross-domain code completion tasks (e.g., Python to Java) compared to domain-specialized models, and what metrics (e.g., BLEU, accuracy) best capture these differences. Benchmark…

[2202]
1 June 2026. Score: 2.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the trade-off between real-time vulnerability classification accuracy and throughput compare between CodeT5 models and other state-of-the-art code language models (e.g., CodeGen, CodeGPT). Many ML-based…

[2201]
1 June 2026. Score: 3.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the robustness of Deepseek R1 to interference in LoRA-based fine-tuning compare to full fine-tuning when evaluated on the MBPP benchmark in terms of accuracy and latency. Recently, the instruction-tuning…

[2200]
1 June 2026. Score: 3.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the scalability of CodeT5-based vulnerability detection in IDE environments when processing incremental code changes versus full-file analysis, as measured by latency per code edit and. In the rapidly…

[2199]
1 June 2026. Score: 3.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the integration of CodeT5-based vulnerability detection into IDE environments compare to standalone processing in terms of token-level latency and GPU memory utilization when evaluated on. Texture…

[2198]
1 June 2026. Score: 2.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: What is the impact of imperfect orthogonality in LoRA-based fine-tuning on the inference throughput of Codestral when evaluated on the HumanEval benchmark. Pre-training Large Language Models (LLMs) on web-scale…

[2197]
1 June 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How does the choice of spreading factor (SF) in LoRa modulation affect the F1-score performance of Llama3 on QuixBugs when using QLoRA fine-tuning compared to full fine-tuning. Pre-training Large Language Models…

[2196]
1 June 2026. Score: 4.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the trade-off between inference efficiency (latency, throughput) and F1-score performance for Llama3, Codestral, and Deepseek R1 when deployed for vulnerability detection across multiple. This study…

[2195]
1 June 2026. Score: 2.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the choice of fine-tuning hyperparameters impact the cross-language generalization performance of Llama3, Codestral, and Deepseek R1 on Big-Vul, as measured by F1-score gaps between seen and. Anomaly…

[2194]
1 June 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the diversity-weight parameter in Vendi-RAG influence the alignment of FLAN-T5-xl outputs with human preferences on the TruthfulQA benchmark compared to BM25 retrieval. Retrieval-augmented generation…

[2193]
1 June 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the impact of varying the diversity-weight parameter in Vendi-RAG on the accuracy of FLAN-T5-xl for code generation tasks in the HumanEval benchmark compared to BM25 retrieval. Retrieval-augmented…

[2192]
1 June 2026. Score: 6.60/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: Does the choice between full-graph and mini-batch training pipelines affect the robustness of Graph Neural Networks against adversarial perturbations in control flow graphs used for security analysis. Malware…

[2191]
1 June 2026. Score: 2.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: To what extent do system-level optimizations for mini-batch GNN training improve inference throughput and memory efficiency when deploying multimodal vulnerability detectors on resource-constrained. Graph Neural…

[2190]
1 June 2026. Score: 3.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the memory consumption and latency of DeepSeek R1 and Codestral compare when using IceCache's KV-cache management against traditional on-GPU KV-cache in autoregressive generation tasks with. Key-Value…

[2189]
1 June 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does mini-batch training impact the convergence speed and final accuracy of Graph Neural Networks for code vulnerability detection compared to full-graph training on large-scale software. Full-graph and…

[2188]
1 June 2026. Score: 3.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the trade-off between throughput and accuracy when deploying DeepSeek R1 and Codestral with IceCache's external vector database-based KV-cache on resource-constrained hardware for code. Key-Value (KV)…

[2187]
1 June 2026. Score: 4.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the robustness of vulnerability detection models trained on code property graphs with integrated commit messages vary against adversarial code perturbations compared to models using only. Deep Neural…

[2186]
1 June 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What is the effect of using convolutional neural networks versus graph neural networks on the inference efficiency and detection accuracy when processing code property graphs augmented with commit. The increasing…

[2185]
1 June 2026. Score: 6.67/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does incorporating natural language commit messages into code property graph representations impact the F1-score of vulnerability detection models on the Big-Vul dataset compared to graph-only. A commit…

[2184]
1 June 2026. Score: 3.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: To what extent does training on synthetic data from Claude 2 improve the cross-domain robustness of small language models on adversarial NLI datasets compared to ChatGPT-3.5-Turbo. Natural Language Inference…

[2183]
1 June 2026. Score: 3.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the inference latency of large pre-trained video encoders change when fine-tuned on synthetic gesture data versus human-annotated datasets across varying batch sizes. In this work, we explore the…

[2182]
1 June 2026. Score: 5.67/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: To what extent does training on synthetic video data impact the zero-shot cross-domain generalization accuracy of multimodal video-language models compared to models trained on real-world annotations. In this…

[2181]
1 June 2026. Score: 4.33/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the impact of synthetic data source quality on the inference efficiency and throughput of small language models trained for natural language inference tasks. The evolution of Generative Pre-trained…

[2180]
1 June 2026. Score: 4.07/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the reasoning accuracy of small language models on NLI benchmarks change when fine-tuned on synthetic data from ChatGPT-3.5-Turbo compared to data from ChatGPT-4. Large Language Models (LLMs) have…

« Prev 1 128 129 130 131 132 218 Next »