Index |  Research ▾  |  Verification ▾  | About
Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 8301 papers; mean review score 5.73/10; 2276 Zenodo DOIs. Verified contributions (Gate 2: formal proof or sandbox reproduction): 149. 97 claims falsified by the pipeline (see falsification record). 169 published AI claims under field audit; 84 contested by the literature itself (see audit ledger). 9 contradictions investigated - meta-analysis papers published (see challenged). What does this mean?
Results 7626–7650 of 8301 entries

Papers

[676]
30 May 2026. Score: 5.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the dynamic hot neuron threshold adjustment in PowerInfer compare to fixed threshold methods in terms of inference latency and memory efficiency when applied to LLaMA-70B on MBPP Python. Large Language…

[675]
30 May 2026. Score: 2.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the detection accuracy of federated learning models compare to centralized deep neural networks when evaluated on the AndroZoo benchmark with varying levels of code obfuscation and. This work investigates…

[674]
30 May 2026. Score: 3.07/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How robust are federated learning-based malware detection models to adversarial attacks targeting the aggregation process, measured by the degradation in F1-score when subjected to gradient poisoning. This work…

[673]
30 May 2026. Score: 2.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: What is the throughput scalability of federated learning frameworks like FEDetect when increasing the number of client devices in a distributed IoT malware detection setting. This work investigates the…

[672]
30 May 2026. Score: 4.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 6 peer-reviewed papers addressing the following research question: To what extent does domain adaptation via federated transfer learning improve model generalization in malware detection when trained on N-BaIoT and evaluated on unseen IoT device types, measured by. This work…

[671]
30 May 2026. Score: 2.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the integration of differential privacy in federated learning-based malware detection models affect the trade-off between model accuracy and communication efficiency, measured by F1-score. This work…

[670]
30 May 2026. Score: 3.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the NIASM hybrid approach perform on cross-lingual factual consistency (F1 score) compared to monolingual fine-tuning in multilingual models like Bloom and Llama-2 on the XSUM and CNN/DM. In an era…

[669]
30 May 2026. Score: 7.07/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: To what extent does the NIASM framework improve inference efficiency (tokens/sec) compared to baseline models like Vicuna-13B and Baichuan-2 when deployed on low-resource hardware for long-form. Customized…

[668]
30 May 2026. Score: 6.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does varying the TAE token misalignment threshold affect the hallucination rates of Vicuna-13B and Baichuan-2 across different domains in the FactCC and HalluEval benchmarks. Large language models (LLMs) have…

[667]
30 May 2026. Score: 7.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How does the alignment score sensitivity of Baichuan 2 and Vicuna-13B compare when evaluated on multimodal benchmarks with varying degrees of token misalignment under constrained inference budgets. Multimodal LLMs…

[666]
30 May 2026. Score: 3.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: Does model size scaling (e.g., 7B vs. 13B vs. 30B parameters) correlate with syntax error reduction in CoT-generated code for structured data tasks on BigCodeBench. Large language models (LLMs) have demonstrated…

[665]
30 May 2026. Score: 7.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the performance of Code Llama and Code Llama - Python models scale with increasing model size (7B to 70B parameters) on BigCodeBench tasks measuring cross-library function composition,. We release Code…

[664]
30 May 2026. Score: 3.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What is the cross-domain generalization accuracy of fine-tuned Codestral-7B versus Llama3-70B on unseen programming languages beyond Python for security vulnerability classification. Many ML-based approaches have…

[663]
30 May 2026. Score: 2.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the impact of combining semantic literature retrieval (Elicit) with code-focused context engineering on the accuracy of generated code for niche domains in multi-file projects, measured by. Large Language…

[662]
30 May 2026. Score: 6.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the effect of model partitioning strategies in split computing on the throughput of Llama3-70B versus Codestral-34B for code generation tasks on HumanEval-hard. We introduce SIMCOPILOT, a benchmark that…

[661]
30 May 2026. Score: 7.63/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20454208

Abstract: This report synthesises findings from 3 peer-reviewed papers addressing the following research question: How does the performance of Gemini 1.5 Pro with an 8M context window compare to Llama3-70B with retrieval augmentation in classifying vulnerabilities on the CodeXGLUE security subset when the input. Large Language…

[660]
30 May 2026. Score: 8.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20454191

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the robustness of vulnerability classification models like Gemini 1.5 Pro and Llama3-70B with retrieval augmentation vary when presented with adversarial or noisy inputs in the CodeXGLUE. We release Code…

[659]
30 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20454155

Abstract: This report synthesises findings from 5 peer-reviewed papers addressing the following research question: What is the inference efficiency difference between Gemini 1.5 Pro and Llama3-70B with retrieval augmentation when processing large-scale security vulnerability classification tasks on the CodeXGLUE. Large…

[658]
30 May 2026. Score: 8.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20454015

Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How do small language models (SLMs) fine-tuned with multimodal context compare to larger multimodal LLMs in terms of CWE detection accuracy and alignment metrics on the extended Big-Vul dataset. In this paper, we…

[657]
30 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20454004

Abstract: This report synthesises findings from 1 peer-reviewed paper addressing the following research question: What is the trade-off between model size and inference throughput for SecLM variants fine-tuned with multimodal inputs, as measured by latency comparisons on edge devices versus cloud infrastructure. Probably no…

[656]
30 May 2026. Score: 7.33/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the maximum context length that Mistral-Large-2 can handle. We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms…

[655]
30 May 2026. Score: 8.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20453852

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What is the impact of quantization techniques (e.g., 4-bit, 8-bit) on the inference efficiency and accuracy of SecLM-fine-tuned Llama3 for text classification tasks on edge devices with limited. Abstract The rapid…

[654]
30 May 2026. Score: 3.33/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the inference latency of Mistral-Large-2 scale with input sequence length on MBPP code completion tasks. We release Code Llama, a family of large language models for code based on Llama 2 providing…

[653]
30 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20453720

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the reasoning accuracy of Mistral-Large-2 on GSM8K compared to other 7B parameter models. Large Language Models (LLMs) have demonstrated remarkable versatility in recent years, offering potential…

[652]
30 May 2026. Score: 7.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the impact of model size and training data on Mistral-Large-2's LiveCodeBench performance, and how does it scale with increasing parameter count. In this report, we introduce Qwen2.5, a comprehensive…

« Prev 1 304 305 306 307 308 333 Next »