Assignee Research: Index of Papers

Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 4606 papers; mean review score 5.86/10; 1460 Zenodo DOIs.

Results 3926–3950 of 4605 entries

Papers

[680]

Cross-Task Fine-Tuning for Robust CWE Detection in Python Code Under Adversarial Perturbations

30 May 2026. Score: 0.33/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does domain adaptation via cross-task fine-tuning affect the robustness of SLMs in detecting CWEs in Python code under adversarial perturbations compared to a baseline of pre-trained LLMs. A joint measurement…

[679]

On-Device vs. Cloud Deployment Trade-offs for SLMs and LLMs in CWE Detection for Private Python Codebases

30 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the trade-off between inference throughput and pass@1 accuracy for SLMs vs. LLMs in CWE detection tasks on private Python codebases when deployed on-device vs. in cloud environments. Large Language Models…

[678]

Scaling Human Preference Alignment in LLaMA-70B with PowerInfer Threshold Adjustment

30 May 2026. Score: 2.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does the alignment of LLaMA-70B with human preferences via PowerInfer's dynamic threshold adjustment scale with model size, as measured by accuracy on MBPP and the degree of preference divergence. Aligning…

[677]

Robustness of MORL-Based Preference Alignment in PowerInfer Across Programming Languages

30 May 2026. Score: 5.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the robustness of MORL-based preference alignment in PowerInfer when evaluated across diverse programming languages beyond Python (e.g., JavaScript, Java) using the HumanEval benchmark. Fine-grained…

[676]

Dynamic Hot Neuron Threshold Adjustment in PowerInfer for LLaMA-70B Inference Efficiency

30 May 2026. Score: 5.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the dynamic hot neuron threshold adjustment in PowerInfer compare to fixed threshold methods in terms of inference latency and memory efficiency when applied to LLaMA-70B on MBPP Python. Large Language…

[675]

Federated and Centralized Learning for Malware Detection under Obfuscation and Adversarial Attacks

30 May 2026. Score: 2.33/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the detection accuracy of federated learning models compare to centralized deep neural networks when evaluated on the AndroZoo benchmark with varying levels of code obfuscation and. This work investigates…

[674]

Federated Learning Malware Detection Robustness Under Aggregation-Targeted Adversarial Attacks

30 May 2026. Score: 3.07/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How robust are federated learning-based malware detection models to adversarial attacks targeting the aggregation process, measured by the degradation in F1-score when subjected to gradient poisoning. This work…

[673]

Federated Learning Throughput Scalability in Distributed IoT Malware Detection

30 May 2026. Score: 2.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: What is the throughput scalability of federated learning frameworks like FEDetect when increasing the number of client devices in a distributed IoT malware detection setting. This work investigates the…

[672]

Federated Transfer Learning for Cross-Domain Malware Detection in IoT Devices

30 May 2026. Score: 4.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 6 peer-reviewed papers addressing the following research question: To what extent does domain adaptation via federated transfer learning improve model generalization in malware detection when trained on N-BaIoT and evaluated on unseen IoT device types, measured by. This work…

[671]

Differential Privacy Trade-offs in Federated Malware Detection for Heterogeneous IoT Networks

30 May 2026. Score: 2.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the integration of differential privacy in federated learning-based malware detection models affect the trade-off between model accuracy and communication efficiency, measured by F1-score. This work…

[670]

NIASM Hybrid Approach Performance in Cross-Lingual Factual Consistency Benchmarks

30 May 2026. Score: 3.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the NIASM hybrid approach perform on cross-lingual factual consistency (F1 score) compared to monolingual fine-tuning in multilingual models like Bloom and Llama-2 on the XSUM and CNN/DM. In an era…

[669]

NIASM Framework Enhances Inference Efficiency on Low-Resource Hardware for Long-Form Summarization

30 May 2026. Score: 7.07/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: To what extent does the NIASM framework improve inference efficiency (tokens/sec) compared to baseline models like Vicuna-13B and Baichuan-2 when deployed on low-resource hardware for long-form. Customized…

[668]

Token Misalignment Threshold Effects on Hallucination Rates in Vicuna-13B and Baichuan-2

30 May 2026. Score: 6.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does varying the TAE token misalignment threshold affect the hallucination rates of Vicuna-13B and Baichuan-2 across different domains in the FactCC and HalluEval benchmarks. Large language models (LLMs) have…

[667]

Alignment Score Sensitivity of Baichuan 2 and Vicuna-13B under Token Misalignment Constraints

30 May 2026. Score: 7.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How does the alignment score sensitivity of Baichuan 2 and Vicuna-13B compare when evaluated on multimodal benchmarks with varying degrees of token misalignment under constrained inference budgets. Multimodal LLMs…

[666]

Scaling Model Size and Syntax Error Reduction in CoT-Generated Code for BigCodeBench

30 May 2026. Score: 3.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: Does model size scaling (e.g., 7B vs. 13B vs. 30B parameters) correlate with syntax error reduction in CoT-generated code for structured data tasks on BigCodeBench. Large language models (LLMs) have demonstrated…

[665]

Scaling Performance of Code Llama and Code Llama-Python on BigCodeBench Across Model Sizes

30 May 2026. Score: 7.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the performance of Code Llama and Code Llama - Python models scale with increasing model size (7B to 70B parameters) on BigCodeBench tasks measuring cross-library function composition,. We release Code…

[664]

Fine-Tuned Codestral-7B and Llama3-70B Cross-Domain Generalization in Security Vulnerability Classification

30 May 2026. Score: 3.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What is the cross-domain generalization accuracy of fine-tuned Codestral-7B versus Llama3-70B on unseen programming languages beyond Python for security vulnerability classification. Many ML-based approaches have…

[663]

Semantic Literature Retrieval and Code Context Engineering for Multi-File Project Accuracy

30 May 2026. Score: 2.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the impact of combining semantic literature retrieval (Elicit) with code-focused context engineering on the accuracy of generated code for niche domains in multi-file projects, measured by. Large Language…

[662]

Split Computing Partitioning Strategies and Throughput in Llama3-70B vs. Codestral-34B for Code Generation

30 May 2026. Score: 6.67/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the effect of model partitioning strategies in split computing on the throughput of Llama3-70B versus Codestral-34B for code generation tasks on HumanEval-hard. We introduce SIMCOPILOT, a benchmark that…

[661]

Long-Context Gemini 1.5 Pro vs Retrieval-Augmented Llama3-70B in Multi-File Code Vulnerability Classification

30 May 2026. Score: 7.63/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20454208

Abstract: This report synthesises findings from 3 peer-reviewed papers addressing the following research question: How does the performance of Gemini 1.5 Pro with an 8M context window compare to Llama3-70B with retrieval augmentation in classifying vulnerabilities on the CodeXGLUE security subset when the input. Large Language…

[660]

Robustness of Retrieval-Augmented Vulnerability Classifiers Under Adversarial Inputs

30 May 2026. Score: 8.33/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20454191

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the robustness of vulnerability classification models like Gemini 1.5 Pro and Llama3-70B with retrieval augmentation vary when presented with adversarial or noisy inputs in the CodeXGLUE. We release Code…

[659]

Gemini 1.5 Pro and Llama3-70B Inference Efficiency in Retrieval-Augmented Code Vulnerability Classification

30 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20454155

Abstract: This report synthesises findings from 5 peer-reviewed papers addressing the following research question: What is the inference efficiency difference between Gemini 1.5 Pro and Llama3-70B with retrieval augmentation when processing large-scale security vulnerability classification tasks on the CodeXGLUE. Large…

[658]

Multimodal Fine-Tuned Small Language Models vs. Large Multimodal LLMs in CWE Detection Accuracy

30 May 2026. Score: 8.33/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20454015

Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How do small language models (SLMs) fine-tuned with multimodal context compare to larger multimodal LLMs in terms of CWE detection accuracy and alignment metrics on the extended Big-Vul dataset. In this paper, we…

[657]

SecLM Model Size and Inference Throughput Trade-offs on Edge and Cloud Devices

30 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20454004

Abstract: This report synthesises findings from 1 peer-reviewed paper addressing the following research question: What is the trade-off between model size and inference throughput for SecLM variants fine-tuned with multimodal inputs, as measured by latency comparisons on edge devices versus cloud infrastructure. Probably no…

[656]

Maximum Context Length Capabilities of Mistral-Large-2 Across Studies

30 May 2026. Score: 7.33/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the maximum context length that Mistral-Large-2 can handle. We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms…

« Prev 1 … 156 157 158 159 160 … 185 Next »