Index |  Research ▾  |  Verification ▾  | About
Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 8299 papers; mean review score 5.73/10; 2274 Zenodo DOIs. Verified contributions (Gate 2: formal proof or sandbox reproduction): 149. 97 claims falsified by the pipeline (see falsification record). 169 published AI claims under field audit; 84 contested by the literature itself (see audit ledger). 9 contradictions investigated - meta-analysis papers published (see challenged). What does this mean?
Results 7601–7625 of 8299 entries

Papers

[699]
30 May 2026. Score: 8.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20456525

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How does 4-bit versus 8-bit quantization affect the HumanEval pass@1 scores of code generation models. Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial…

[698]
30 May 2026. Score: 9.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20456474

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How do memory-efficient multimodal architectures perform relative to LLaVA-NeXT on long-context video understanding tasks within the Video-MME benchmark. We introduce phi-3-mini, a 3.8 billion parameter language…

[697]
30 May 2026. Score: 3.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does the inference latency of Gemini 1.5 Flash compare to LLaVA-NeXT on the Video-MME benchmark when constrained to 24GB VRAM. In this work, we present a novel method to tackle the token generation challenge…

[696]
30 May 2026. Score: 9.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20456346

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the score on the MT-bench change for Phi-3-mini versus Llama 3 70B when evaluated on code generation tasks involving long-context reasoning spanning 100K tokens. We introduce phi-3-mini, a 3.8 billion…

[695]
30 May 2026. Score: 8.40/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20456246

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does the performance of Deepseek R1 and Codestral compare on Qiskit-based quantum code generation tasks when evaluated using the Qiskit HumanEval benchmark with varying levels of quantum circuit. As Large…

[694]
30 May 2026. Score: 5.57/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: What is the impact of task-specific fine-tuning on the throughput and accuracy of small language models compared to large models in code generation benchmarks such as HumanEval. Large Language Models (LLMs) have…

[693]
30 May 2026. Score: 3.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How robust is GADT3 to adversarial attacks on graph structure and node features compared to traditional supervised GAD methods, measured using the AUC-ROC score on perturbed datasets. Real-time traffic prediction…

[692]
30 May 2026. Score: 4.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the memory efficiency of LLaVA-UHD scale with image resolution (e.g., 1024x1024 to 8192x8192) compared to dense inference in Visual-LLM benchmarks like LVIS. Visual encoding constitutes the basis of…

[691]
30 May 2026. Score: 3.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does neuron activation sparsity correlate with reasoning task accuracy degradation when models are pruned to cold neurons only in PowerInfer's inference pipeline. Activation sparsity offers a compelling route…

[690]
30 May 2026. Score: 3.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the impact of varying the number of homophily-guided self-supervision steps in GADT3 on its inference efficiency and detection accuracy across different graph domains. Graph Anomaly Detection (GAD) has…

[689]
30 May 2026. Score: 4.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does PowerInfer's neuron activation sparsity optimization affect inference latency when scaling from LLaMA-33B to LLaMA-70B across different consumer GPU memory configurations. This paper introduces…

[688]
30 May 2026. Score: 3.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the accuracy of GADT3 compare to other state-of-the-art cross-domain graph anomaly detection models on standard graph benchmarks like Reddit and Twitter datasets. Anomaly detection is defined as…

[687]
30 May 2026. Score: 4.33/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 3 peer-reviewed papers addressing the following research question: What is the accuracy trade-off between dense and quantized LLaVA-UHD models on the PopVQA benchmark when processing images with varying aspect ratios (e.g., 16:9 vs. 9:16). We investigate the behaviour of quantum…

[686]
30 May 2026. Score: 6.03/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the inference latency of quantized LLaVA-UHD compare to LLaVA-1.5 when processing ultra-high-resolution images (e.g., 4K) across multimodal benchmarks like MMBench or SEED-Bench. The advent of real-time…

[685]
30 May 2026. Score: 7.13/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does the scaling behavior of quantization-aware training vary across different LLaVA model versions on multimodal reasoning benchmarks. Large Language Models (LLMs) have drawn a lot of attention due to their…

[684]
30 May 2026. Score: 2.40/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How does activation-aware weight quantization affect LLaVA-1.5 performance on the GQA benchmark compared to standard post-training quantization methods. We present LLaVA-OneVision-1.5, a novel family of Large…

[683]
30 May 2026. Score: 3.07/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the PowerInfer hot neuron activation threshold parameter impact inference latency and accuracy trade-offs for LLaMA-33B and LLaMA-70B on the HumanEval code generation benchmark. Deploying local AI…

[682]
30 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20455667

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the forecasting accuracy of Llama3 compare to domain-specific models like Prophet or ARIMA when evaluated on high-frequency renewable energy time-series data (e.g., minute-level solar power. This study…

[681]
30 May 2026. Score: 5.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does the pass@1 accuracy of fine-tuned LLaMA-70B on MBPP Python function synthesis compare to CodeGen/CodeLlama when evaluated under the same dynamic hot neuron threshold settings in PowerInfer. We benchmark…

[680]
30 May 2026. Score: 0.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does domain adaptation via cross-task fine-tuning affect the robustness of SLMs in detecting CWEs in Python code under adversarial perturbations compared to a baseline of pre-trained LLMs. A joint measurement…

[679]
30 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the trade-off between inference throughput and pass@1 accuracy for SLMs vs. LLMs in CWE detection tasks on private Python codebases when deployed on-device vs. in cloud environments. Large Language Models…

[678]
30 May 2026. Score: 2.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does the alignment of LLaMA-70B with human preferences via PowerInfer's dynamic threshold adjustment scale with model size, as measured by accuracy on MBPP and the degree of preference divergence. Aligning…

[677]
30 May 2026. Score: 5.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the robustness of MORL-based preference alignment in PowerInfer when evaluated across diverse programming languages beyond Python (e.g., JavaScript, Java) using the HumanEval benchmark. Fine-grained…

[676]
30 May 2026. Score: 5.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the dynamic hot neuron threshold adjustment in PowerInfer compare to fixed threshold methods in terms of inference latency and memory efficiency when applied to LLaMA-70B on MBPP Python. Large Language…

[675]
30 May 2026. Score: 2.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the detection accuracy of federated learning models compare to centralized deep neural networks when evaluated on the AndroZoo benchmark with varying levels of code obfuscation and. This work investigates…

« Prev 1 303 304 305 306 307 332 Next »