Index  |  Benchmarks  |  Mathematics  |  Graph  |  About
Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 5181 papers; mean review score 5.70/10; 1466 Zenodo DOIs.
Results 3451–3475 of 5181 entries

Papers

[1731]
31 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the impact of stochastic layer fusion strategies on out-of-distribution robustness scores for federated models evaluated on DomainBed wilds benchmarks. We propose HeroCrystal, a novel privacy-preserving…

[1730]
31 May 2026. Score: 4.33/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does FedQuad's performance compare to existing federated learning methods (e.g., FedAvg, FedProx) in terms of F1-score and false positive rate when evaluated on standardized intrusion detection. Anomaly…

[1729]
31 May 2026. Score: 2.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: What is the impact of adaptive client sampling strategies on the convergence rate and generalization performance of FedQuad when applied to cross-domain federated learning scenarios with varying. This paper…

[1728]
31 May 2026. Score: 8.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20478838

Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: What is the impact of different federated learning aggregation strategies (FedAvg, FedProx, SCAFFOLD) on model alignment and robustness to non-IID data distributions when combined with compressive. Federated…

[1727]
31 May 2026. Score: 8.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20478823

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the integration of compressive sensing techniques with OTA-FL in massive MIMO systems affect the convergence rate and final model accuracy compared to traditional FL methods, as measured on. Federated…

[1726]
31 May 2026. Score: 3.33/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the one-to-many relationship defense strategy impact zero-shot image-text retrieval accuracy on MS-COCO under simultaneous multimodal adversarial perturbations compared to standard. Pre-trained…

[1725]
31 May 2026. Score: 3.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: Does the training stability improvement from the gating mechanism in Gated Sparse Attention translate to better few-shot reasoning capabilities on mathematical programming tasks compared to dense. The…

[1724]
31 May 2026. Score: 7.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How does Gated Sparse Attention affect inference throughput and memory consumption during long-context code generation on HumanEval-Long compared to standard sparse attention variants. Modern large language models…

[1723]
31 May 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does integrating dynamic facial affect embeddings into multimodal code generation models impact accuracy on the HumanEval benchmark under simulated shifting user preference distributions. Understanding and…

[1722]
31 May 2026. Score: 3.33/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the throughput-accuracy trade-off of MQuant's post-training quantization on LLaVA compare to structured pruning methods when evaluated on multimodal reasoning tasks. Post-training quantization (PTQ) of…

[1721]
31 May 2026. Score: 4.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: Does full static quantization disproportionately reduce robustness against adversarial visual perturbations in multimodal models as measured by accuracy drops on the AdvBench suite. Multimodal large language…

[1720]
31 May 2026. Score: 4.93/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does dynamic problem difficulty adaptation in self-invoking code generation benchmarks affect model robustness compared to static benchmarks using pass@k accuracy across varying complexities. We introduce…

[1719]
31 May 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the degradation in code generation performance for quantized multimodal models when evaluated on the MultiPL-E benchmark. Background: AI-driven prediction algorithms have the potential to enhance…

[1718]
31 May 2026. Score: 3.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the trade-off between inference time efficiency and solution correctness when comparing uniform alignment versus branch-aware preference alignment in cross-domain self-invoking code. Repeated sampling…

[1717]
31 May 2026. Score: 4.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: Can fine-tuning on self-invoking code generation tasks improve the cross-domain generalization of multimodal LLMs, as measured by performance differences on MBPP Pro versus HumanEval Pro benchmarks. We introduce…

[1716]
31 May 2026. Score: 3.33/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: To what extent does activation-aware zero-shot quantization improve robustness to domain shift in multimodal LLM visual grounding tasks compared to standard post-training quantization on RefCOCO+. Multimodal…

[1715]
31 May 2026. Score: 2.67/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does Q-shaping compare to reward shaping in terms of sample efficiency and convergence speed when applied to code generation tasks using the DS-1000 benchmark across multiple programming languages. Q-shaping…

[1714]
31 May 2026. Score: 6.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: Can multimodal language models trained with combined verbal and non-verbal reward signals maintain their pass@k scores when evaluated on cross-domain code generation benchmarks. In natural human-to-human…

[1713]
31 May 2026. Score: 2.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How do multimodal models perform on HumanEval-V compared to text-only models when evaluated with accuracy metrics on diagram-based reasoning tasks. Understanding and reasoning over diagrams is a fundamental aspect…

[1712]
31 May 2026. Score: 1.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What is the impact of gradient accumulation on critical batch size thresholds for 1B-parameter models during pre-training on diagram-based coding tasks. Training large-scale models under given resources requires…

[1711]
31 May 2026. Score: 4.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the critical batch size scaling affect pass@1 accuracy on HumanEval+ for WizardCoder and DeepSeek-R1 when trained with mixed precision versus full precision. Training large-scale models under given…

[1710]
31 May 2026. Score: 3.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How robust are multimodal code models with visual encoders to adversarial attacks on visual inputs in code generation tasks, compared to text-only models evaluated on HumanEval-V. Understanding and reasoning over…

[1709]
31 May 2026. Score: 4.87/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the throughput comparison of multimodal code models with visual encoders versus text-only models when evaluated on the HumanEval-V benchmark. Understanding and reasoning over diagrams is a fundamental…

[1708]
31 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the integration of visual diagram encoders in multimodal code models impact the accuracy of code generation tasks on HumanEval-V compared to text-only baselines like CodeLlama. Recent multimodal large…

[1707]
31 May 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the robustness gap between LLaVul and fine-tuned SLMs on adversarially perturbed code samples from the Devign benchmark, measured in terms of accuracy drop under input obfuscation. Detecting toxic content…

« Prev 1 137 138 139 140 141 208 Next »