Assignee Research: Index of Papers

Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 8301 papers; mean review score 5.73/10; 2276 Zenodo DOIs. Verified contributions (Gate 2: formal proof or sandbox reproduction): 149. 97 claims falsified by the pipeline (see falsification record). 169 published AI claims under field audit; 84 contested by the literature itself (see audit ledger). 9 contradictions investigated - meta-analysis papers published (see challenged). What does this mean?

Results 7626–7650 of 8301 entries

Papers

[676]

Dynamic Hot Neuron Threshold Adjustment in PowerInfer for LLaMA-70B Inference Efficiency

30 May 2026. Score: 5.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the dynamic hot neuron threshold adjustment in PowerInfer compare to fixed threshold methods in terms of inference latency and memory efficiency when applied to LLaMA-70B on MBPP Python. Large Language…

[675]

Federated and Centralized Learning for Malware Detection under Obfuscation and Adversarial Attacks

30 May 2026. Score: 2.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the detection accuracy of federated learning models compare to centralized deep neural networks when evaluated on the AndroZoo benchmark with varying levels of code obfuscation and. This work investigates…

[674]

Federated Learning Malware Detection Robustness Under Aggregation-Targeted Adversarial Attacks

30 May 2026. Score: 3.07/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How robust are federated learning-based malware detection models to adversarial attacks targeting the aggregation process, measured by the degradation in F1-score when subjected to gradient poisoning. This work…

[673]

Federated Learning Throughput Scalability in Distributed IoT Malware Detection

30 May 2026. Score: 2.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: What is the throughput scalability of federated learning frameworks like FEDetect when increasing the number of client devices in a distributed IoT malware detection setting. This work investigates the…

[672]

Federated Transfer Learning for Cross-Domain Malware Detection in IoT Devices

30 May 2026. Score: 4.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 6 peer-reviewed papers addressing the following research question: To what extent does domain adaptation via federated transfer learning improve model generalization in malware detection when trained on N-BaIoT and evaluated on unseen IoT device types, measured by. This work…

[671]

Differential Privacy Trade-offs in Federated Malware Detection for Heterogeneous IoT Networks

30 May 2026. Score: 2.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the integration of differential privacy in federated learning-based malware detection models affect the trade-off between model accuracy and communication efficiency, measured by F1-score. This work…

[670]

NIASM Hybrid Approach Performance in Cross-Lingual Factual Consistency Benchmarks

30 May 2026. Score: 3.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the NIASM hybrid approach perform on cross-lingual factual consistency (F1 score) compared to monolingual fine-tuning in multilingual models like Bloom and Llama-2 on the XSUM and CNN/DM. In an era…

[669]

NIASM Framework Enhances Inference Efficiency on Low-Resource Hardware for Long-Form Summarization

30 May 2026. Score: 7.07/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: To what extent does the NIASM framework improve inference efficiency (tokens/sec) compared to baseline models like Vicuna-13B and Baichuan-2 when deployed on low-resource hardware for long-form. Customized…

[668]

Token Misalignment Threshold Effects on Hallucination Rates in Vicuna-13B and Baichuan-2

30 May 2026. Score: 6.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does varying the TAE token misalignment threshold affect the hallucination rates of Vicuna-13B and Baichuan-2 across different domains in the FactCC and HalluEval benchmarks. Large language models (LLMs) have…

[667]

Alignment Score Sensitivity of Baichuan 2 and Vicuna-13B under Token Misalignment Constraints

30 May 2026. Score: 7.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How does the alignment score sensitivity of Baichuan 2 and Vicuna-13B compare when evaluated on multimodal benchmarks with varying degrees of token misalignment under constrained inference budgets. Multimodal LLMs…

[666]

Scaling Model Size and Syntax Error Reduction in CoT-Generated Code for BigCodeBench

30 May 2026. Score: 3.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: Does model size scaling (e.g., 7B vs. 13B vs. 30B parameters) correlate with syntax error reduction in CoT-generated code for structured data tasks on BigCodeBench. Large language models (LLMs) have demonstrated…

[665]

Scaling Performance of Code Llama and Code Llama-Python on BigCodeBench Across Model Sizes

30 May 2026. Score: 7.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the performance of Code Llama and Code Llama - Python models scale with increasing model size (7B to 70B parameters) on BigCodeBench tasks measuring cross-library function composition,. We release Code…

[664]

Fine-Tuned Codestral-7B and Llama3-70B Cross-Domain Generalization in Security Vulnerability Classification

30 May 2026. Score: 3.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What is the cross-domain generalization accuracy of fine-tuned Codestral-7B versus Llama3-70B on unseen programming languages beyond Python for security vulnerability classification. Many ML-based approaches have…

[663]

Semantic Literature Retrieval and Code Context Engineering for Multi-File Project Accuracy

30 May 2026. Score: 2.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the impact of combining semantic literature retrieval (Elicit) with code-focused context engineering on the accuracy of generated code for niche domains in multi-file projects, measured by. Large Language…

[662]

Split Computing Partitioning Strategies and Throughput in Llama3-70B vs. Codestral-34B for Code Generation

30 May 2026. Score: 6.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the effect of model partitioning strategies in split computing on the throughput of Llama3-70B versus Codestral-34B for code generation tasks on HumanEval-hard. We introduce SIMCOPILOT, a benchmark that…

[661]

Long-Context Gemini 1.5 Pro vs Retrieval-Augmented Llama3-70B in Multi-File Code Vulnerability Classification

30 May 2026. Score: 7.63/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20454208

Abstract: This report synthesises findings from 3 peer-reviewed papers addressing the following research question: How does the performance of Gemini 1.5 Pro with an 8M context window compare to Llama3-70B with retrieval augmentation in classifying vulnerabilities on the CodeXGLUE security subset when the input. Large Language…

[660]

Robustness of Retrieval-Augmented Vulnerability Classifiers Under Adversarial Inputs

30 May 2026. Score: 8.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20454191

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the robustness of vulnerability classification models like Gemini 1.5 Pro and Llama3-70B with retrieval augmentation vary when presented with adversarial or noisy inputs in the CodeXGLUE. We release Code…

[659]

Gemini 1.5 Pro and Llama3-70B Inference Efficiency in Retrieval-Augmented Code Vulnerability Classification

30 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20454155

Abstract: This report synthesises findings from 5 peer-reviewed papers addressing the following research question: What is the inference efficiency difference between Gemini 1.5 Pro and Llama3-70B with retrieval augmentation when processing large-scale security vulnerability classification tasks on the CodeXGLUE. Large…

[658]

Multimodal Fine-Tuned Small Language Models vs. Large Multimodal LLMs in CWE Detection Accuracy

30 May 2026. Score: 8.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20454015

Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How do small language models (SLMs) fine-tuned with multimodal context compare to larger multimodal LLMs in terms of CWE detection accuracy and alignment metrics on the extended Big-Vul dataset. In this paper, we…

[657]

SecLM Model Size and Inference Throughput Trade-offs on Edge and Cloud Devices

30 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20454004

Abstract: This report synthesises findings from 1 peer-reviewed paper addressing the following research question: What is the trade-off between model size and inference throughput for SecLM variants fine-tuned with multimodal inputs, as measured by latency comparisons on edge devices versus cloud infrastructure. Probably no…

[656]

Maximum Context Length Capabilities of Mistral-Large-2 Across Studies

30 May 2026. Score: 7.33/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the maximum context length that Mistral-Large-2 can handle. We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms…

[655]

Quantization Trade-offs in SecLM-Fine-Tuned Llama3 for Edge Text Classification

30 May 2026. Score: 8.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20453852

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What is the impact of quantization techniques (e.g., 4-bit, 8-bit) on the inference efficiency and accuracy of SecLM-fine-tuned Llama3 for text classification tasks on edge devices with limited. Abstract The rapid…

[654]

Inference Latency Scaling of Mistral-Large-2 on MBPP Code Completion Tasks

30 May 2026. Score: 3.33/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the inference latency of Mistral-Large-2 scale with input sequence length on MBPP code completion tasks. We release Code Llama, a family of large language models for code based on Llama 2 providing…

[653]

Mistral-Large-2 Reasoning Accuracy on GSM8K vs. 7B Parameter Models

30 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20453720

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the reasoning accuracy of Mistral-Large-2 on GSM8K compared to other 7B parameter models. Large Language Models (LLMs) have demonstrated remarkable versatility in recent years, offering potential…

[652]

Scaling Laws of Model Size and Training Data in Mistral-Large-2 LiveCodeBench Performance

30 May 2026. Score: 7.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the impact of model size and training data on Mistral-Large-2's LiveCodeBench performance, and how does it scale with increasing parameter count. In this report, we introduce Qwen2.5, a comprehensive…

« Prev 1 … 304 305 306 307 308 … 333 Next »