Assignee Research: Index of Papers

Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 8299 papers; mean review score 5.73/10; 2274 Zenodo DOIs. Verified contributions (Gate 2: formal proof or sandbox reproduction): 149. 97 claims falsified by the pipeline (see falsification record). 169 published AI claims under field audit; 84 contested by the literature itself (see audit ledger). 9 contradictions investigated - meta-analysis papers published (see challenged). What does this mean?

Results 7601–7625 of 8299 entries

Papers

[699]

Quantization Impact on HumanEval Performance in Code Generation Models

30 May 2026. Score: 8.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20456525

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How does 4-bit versus 8-bit quantization affect the HumanEval pass@1 scores of code generation models. Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial…

[698]

Memory-Efficient Multimodal Architectures vs. LLaVA-NeXT in Long-Context Video Understanding

30 May 2026. Score: 9.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20456474

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How do memory-efficient multimodal architectures perform relative to LLaVA-NeXT on long-context video understanding tasks within the Video-MME benchmark. We introduce phi-3-mini, a 3.8 billion parameter language…

[697]

Gemini 1.5 Flash and LLaVA-NeXT Inference Latency on Video-MME Under 24GB VRAM Constraints

30 May 2026. Score: 3.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does the inference latency of Gemini 1.5 Flash compare to LLaVA-NeXT on the Video-MME benchmark when constrained to 24GB VRAM. In this work, we present a novel method to tackle the token generation challenge…

[696]

Phi-3-Mini and Llama 3 70B MT-Bench Performance on Long-Context Code Generation

30 May 2026. Score: 9.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20456346

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the score on the MT-bench change for Phi-3-mini versus Llama 3 70B when evaluated on code generation tasks involving long-context reasoning spanning 100K tokens. We introduce phi-3-mini, a 3.8 billion…

[695]

DeepSeek R1 and Codestral Performance on Qiskit Quantum Code Generation Benchmarks

30 May 2026. Score: 8.40/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20456246

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does the performance of Deepseek R1 and Codestral compare on Qiskit-based quantum code generation tasks when evaluated using the Qiskit HumanEval benchmark with varying levels of quantum circuit. As Large…

[694]

Task-Specific Fine-Tuning Effects on Small vs. Large Language Models in Code Generation

30 May 2026. Score: 5.57/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: What is the impact of task-specific fine-tuning on the throughput and accuracy of small language models compared to large models in code generation benchmarks such as HumanEval. Large Language Models (LLMs) have…

[693]

GADT3 Robustness to Adversarial Attacks on Graph Structure and Node Features

30 May 2026. Score: 3.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How robust is GADT3 to adversarial attacks on graph structure and node features compared to traditional supervised GAD methods, measured using the AUC-ROC score on perturbed datasets. Real-time traffic prediction…

[692]

Memory Efficiency Scaling of LLaVA-UHD Across High-Resolution Visual-LLM Benchmarks

30 May 2026. Score: 4.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the memory efficiency of LLaVA-UHD scale with image resolution (e.g., 1024x1024 to 8192x8192) compared to dense inference in Visual-LLM benchmarks like LVIS. Visual encoding constitutes the basis of…

[691]

Cold Neuron Pruning and Reasoning Accuracy in PowerInfer Inference Pipelines

30 May 2026. Score: 3.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does neuron activation sparsity correlate with reasoning task accuracy degradation when models are pruned to cold neurons only in PowerInfer's inference pipeline. Activation sparsity offers a compelling route…

[690]

Homophily-Guided Self-Supervision Steps in GADT3: Efficiency and Accuracy Trade-offs Across Graph Domains

30 May 2026. Score: 3.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the impact of varying the number of homophily-guided self-supervision steps in GADT3 on its inference efficiency and detection accuracy across different graph domains. Graph Anomaly Detection (GAD) has…

[689]

PowerInfer Sparsity Optimization and Inference Latency in LLaMA Scaling Across GPU Configurations

30 May 2026. Score: 4.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does PowerInfer's neuron activation sparsity optimization affect inference latency when scaling from LLaMA-33B to LLaMA-70B across different consumer GPU memory configurations. This paper introduces…

[688]

GADT3 Accuracy in Cross-Domain Graph Anomaly Detection Benchmarks

30 May 2026. Score: 3.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the accuracy of GADT3 compare to other state-of-the-art cross-domain graph anomaly detection models on standard graph benchmarks like Reddit and Twitter datasets. Anomaly detection is defined as…

[687]

Dense vs. Quantized LLaVA-UHD Accuracy on PopVQA Across Image Aspect Ratios

30 May 2026. Score: 4.33/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 3 peer-reviewed papers addressing the following research question: What is the accuracy trade-off between dense and quantized LLaVA-UHD models on the PopVQA benchmark when processing images with varying aspect ratios (e.g., 16:9 vs. 9:16). We investigate the behaviour of quantum…

[686]

Quantized LLaVA-UHD and LLaVA-1.5 Inference Latency on Ultra-High-Resolution Multimodal Benchmarks

30 May 2026. Score: 6.03/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the inference latency of quantized LLaVA-UHD compare to LLaVA-1.5 when processing ultra-high-resolution images (e.g., 4K) across multimodal benchmarks like MMBench or SEED-Bench. The advent of real-time…

[685]

Quantization-Aware Training Scaling in LLaVA Models for Multimodal Reasoning

30 May 2026. Score: 7.13/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does the scaling behavior of quantization-aware training vary across different LLaVA model versions on multimodal reasoning benchmarks. Large Language Models (LLMs) have drawn a lot of attention due to their…

[684]

Activation-Aware Weight Quantization in LLaVA-1.5 on the GQA Benchmark

30 May 2026. Score: 2.40/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How does activation-aware weight quantization affect LLaVA-1.5 performance on the GQA benchmark compared to standard post-training quantization methods. We present LLaVA-OneVision-1.5, a novel family of Large…

[683]

PowerInfer Hot Neuron Threshold Effects on LLaMA-33B and LLaMA-70B Inference Trade-offs

30 May 2026. Score: 3.07/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the PowerInfer hot neuron activation threshold parameter impact inference latency and accuracy trade-offs for LLaMA-33B and LLaMA-70B on the HumanEval code generation benchmark. Deploying local AI…

[682]

Llama3 and Domain-Specific Models for High-Frequency Renewable Energy Forecasting

30 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20455667

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the forecasting accuracy of Llama3 compare to domain-specific models like Prophet or ARIMA when evaluated on high-frequency renewable energy time-series data (e.g., minute-level solar power. This study…

[681]

Fine-Tuned LLaMA-70B vs. CodeGen and CodeLlama on MBPP Pass@1 Accuracy Under PowerInfer Thresholds

30 May 2026. Score: 5.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does the pass@1 accuracy of fine-tuned LLaMA-70B on MBPP Python function synthesis compare to CodeGen/CodeLlama when evaluated under the same dynamic hot neuron threshold settings in PowerInfer. We benchmark…

[680]

Cross-Task Fine-Tuning for Robust CWE Detection in Python Code Under Adversarial Perturbations

30 May 2026. Score: 0.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does domain adaptation via cross-task fine-tuning affect the robustness of SLMs in detecting CWEs in Python code under adversarial perturbations compared to a baseline of pre-trained LLMs. A joint measurement…

[679]

On-Device vs. Cloud Deployment Trade-offs for SLMs and LLMs in CWE Detection for Private Python Codebases

30 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the trade-off between inference throughput and pass@1 accuracy for SLMs vs. LLMs in CWE detection tasks on private Python codebases when deployed on-device vs. in cloud environments. Large Language Models…

[678]

Scaling Human Preference Alignment in LLaMA-70B with PowerInfer Threshold Adjustment

30 May 2026. Score: 2.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does the alignment of LLaMA-70B with human preferences via PowerInfer's dynamic threshold adjustment scale with model size, as measured by accuracy on MBPP and the degree of preference divergence. Aligning…

[677]

Robustness of MORL-Based Preference Alignment in PowerInfer Across Programming Languages

30 May 2026. Score: 5.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the robustness of MORL-based preference alignment in PowerInfer when evaluated across diverse programming languages beyond Python (e.g., JavaScript, Java) using the HumanEval benchmark. Fine-grained…

[676]

Dynamic Hot Neuron Threshold Adjustment in PowerInfer for LLaMA-70B Inference Efficiency

30 May 2026. Score: 5.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the dynamic hot neuron threshold adjustment in PowerInfer compare to fixed threshold methods in terms of inference latency and memory efficiency when applied to LLaMA-70B on MBPP Python. Large Language…

[675]

Federated and Centralized Learning for Malware Detection under Obfuscation and Adversarial Attacks

30 May 2026. Score: 2.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the detection accuracy of federated learning models compare to centralized deep neural networks when evaluated on the AndroZoo benchmark with varying levels of code obfuscation and. This work investigates…

« Prev 1 … 303 304 305 306 307 … 332 Next »