Assignee Research: Index of Papers

Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 5242 papers; mean review score 5.69/10; 1467 Zenodo DOIs.

Results 3401–3425 of 5242 entries

Papers

[1842]

Horizon-Adaptive Mechanisms in LongNav-R1 Boost Out-of-Distribution Navigation Success

31 May 2026. Score: 3.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: To what extent does the horizon-adaptive mechanism in LongNav-R1 improve success rates on out-of-distribution navigation instructions compared to standard fine-tuned VLA models. Language models (LMs) possess a…

[1841]

LongNav-R1 Horizon-Adaptive Multi-Turn RL for Long-Horizon Navigation Efficiency

31 May 2026. Score: 6.53/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the horizon-adaptive multi-turn RL approach in LongNav-R1 compare to other RL-based navigation frameworks like PointGoalRL in terms of sample efficiency and convergence speed when trained on. This paper…

[1840]

Multi-Turn RL vs Single-Turn VLA Inference Latency on RxR-CE Benchmark

31 May 2026. Score: 3.83/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the inference latency of LongNav-R1's multi-turn RL policy compare to single-turn VLA baselines on the RxR-CE benchmark when measured in tokens per second. This paper develops LongNav-R1, an end-to-end…

[1839]

Multi-Stage Validation Frameworks for Stable and Accurate Reward Signals in Code Generation

31 May 2026. Score: 5.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: Can a multi-stage validation framework with progressively complex unit tests (e.g., HumanEval, MBXP) improve the accuracy of reward signals while maintaining training stability in code generation. Current large…

[1838]

Combining Implicit and Explicit Reward Signals for Robust LLM-Generated Code Across Languages

31 May 2026. Score: 5.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: To what extent does combining implicit and explicit reward signals from unit tests improve the robustness of LLM-generated code across different programming languages on the MultiPL-E benchmark. Current large…

[1837]

Dynamic Reward Scaling in Unit Test-Based Reward Modeling for Code Generation Alignment

31 May 2026. Score: 5.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the use of dynamic reward scaling in unit test-based reward modeling affect the trade-off between alignment quality and inference efficiency in code generation tasks on the SQuTR benchmark. Current large…

[1836]

Robustness of DPO and RLHF Alignment Under Varying Dataset Sizes in Multimodal Reasoning

31 May 2026. Score: 3.33/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the impact of dataset size on the robustness of DPO versus RLHF alignment methods when evaluated on multimodal reasoning benchmarks with corrupted image-text pairs. This paper studies the alignment…

[1835]

Difficulty-Based Preference Data Selection Enhances Long-Context Reasoning Efficiency and Alignment

31 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: Does difficulty-based preference data selection improve inference efficiency and alignment quality on long-context reasoning benchmarks compared to standard RLHF pipelines. Aligning large language models (LLMs)…

[1834]

Rationale-Augmented Preference Data Enhances DPO Model Robustness to Adversarial Prompts

31 May 2026. Score: 5.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the inclusion of rationales in preference data influence the robustness of DPO-trained models to adversarial prompts, measured by accuracy on the AdversarialQA benchmark across different.…

[1833]

Human-Annotated Rationales in Preference Data and DPO Alignment Performance on MMLU

31 May 2026. Score: 3.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the integration of human-annotated rationales in preference data impact the alignment performance of DPO on the MMLU benchmark compared to standard RLHF, measured by accuracy across. Aligning language…

[1832]

Sliding Window Attention Effects on GitHub Copilot Inference Efficiency and Accuracy

31 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20480595

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What is the impact of sliding window attention on the inference efficiency of GitHub Copilot in generating large-scale code snippets, and how does this trade-off between speed and accuracy compare to. Synthetic…

[1831]

Fine-Tuning on Adversarial GLUE for Gradient and Attention Attribution Stability

31 May 2026. Score: 4.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: Does fine-tuning on Adversarial GLUE datasets improve the stability of gradient-based attribution methods compared to attention-based methods under perturbed inputs. Adversarial perturbations are noise-like…

[1830]

Integrated Decision Gradients and Attention Rollout Correlation Under Adversarial Attacks in GLUE

31 May 2026. Score: 3.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How does the correlation between Integrated Decision Gradients and Attention Rollout attribution consistency vary across different adversarial attack types in the Adversarial GLUE benchmark. Deep neural networks…

[1829]

Adversarial Attack Strategies on Graph-Based NIDS and Their Latency Impacts Across Datasets

31 May 2026. Score: 6.00/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How do different adversarial attack strategies on graph structure affect the inference latency of GNN-based NIDS models when evaluated using the UNSW-NB15 dataset compared to models trained on the. Deep neural…

[1828]

Integrated Decision Gradients and Attention Rollout Robustness Under Adversarial Text Perturbations

31 May 2026. Score: 3.07/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the robustness of Integrated Decision Gradients compare to Attention Rollout in maintaining feature attribution consistency under adversarial text perturbations across standard NLP benchmark. Large-scale…

[1827]

Head-Tail-Aware KL Divergence Scaling in LLM Distillation and Alignment Metrics

31 May 2026. Score: 3.00/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does head-tail-aware KL divergence scaling affect alignment metrics in large language models compared to standard KL divergence during distillation. Standard Knowledge Distillation (KD) compresses Large…

[1826]

Gradient Masking Effects on GNN-Based NIDS Robustness Against Structural Adversarial Attacks

31 May 2026. Score: 3.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the impact of gradient masking techniques on the robustness of GNN-based NIDS models against structural adversarial attacks as measured by the AUC-ROC score on the KDD Cup 99 dataset. We identify…

[1825]

Multimodal vs. Unimodal Graph Neural Networks on Noisy Spatio-Temporal Benchmarks

31 May 2026. Score: 6.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How do multimodal models perform on spatio-temporal graph datasets with synthetic noise compared to unimodal graph neural networks in terms of inference throughput, as evaluated on benchmarks like. In order to…

[1824]

Adversarial Perturbations and Feature Attribution Consistency in Integrated Gradients vs Attention Rollout

31 May 2026. Score: 5.27/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: To what extent do adversarial perturbations in input text degrade the consistency of feature attribution maps generated by Integrated Gradients compared to Attention Rollout. Attribution algorithms are frequently…

[1823]

Impact of Synthetic Noise on Transformer Reasoning in Spatio-Temporal Graph Tasks

31 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the impact of varying levels of synthetic noise on the reasoning capabilities of transformer-based language models when fine-tuned on spatio-temporal graph datasets, as measured by accuracy. Dynamic Graph…

[1822]

Self-Mutual Learning vs. Teacher-Only Baselines in Spatio-Temporal Graph Inference Efficiency

31 May 2026. Score: 2.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the self-mutual learning approach compare to teacher-only baselines in inference efficiency on spatio-temporal graph datasets when evaluated using standard graph neural network benchmarks. Knowledge…

[1821]

Multimodal Input Integration in LLM Code Generation: Accuracy and Latency on HumanEval Pro and MBPP Pro

31 May 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the integration of multimodal inputs (e.g., diagrams or UML representations) in self-invoking code generation tasks affect the accuracy (pass@1) and latency of LLMs compared to text-only. We introduce…

[1820]

Domain-Specific Fine-Tuning (E.G., Python Vs. Javascript) Performance On Gpt-4O'S Code Generation Robustness As

31 May 2026. Score: 4.40/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does domain-specific fine-tuning (e.g., Python vs. JavaScript) affect GPT-4o's code generation robustness as measured by HumanEval+ test suite accuracy. Large Language Models (LLMs) have demonstrated…

[1819]

Model Size and HumanEval Score Stability Across Evaluation Protocols

31 May 2026. Score: 4.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the correlation between model size (1B–175B parameters) and HumanEval score stability across different evaluation protocols (e.g., deterministic vs. probabilistic sampling). Large language models (LLMs)…

[1818]

Multi-Task Fine-Tuning Effects on Code Generation Robustness Across Problem Complexities

31 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the effect of multi-task fine-tuning (e.g., combining HumanEval Pro with MBPP Pro) on model robustness (measured by pass@k) in self-invoking code generation tasks across different problem. We introduce…

« Prev 1 … 135 136 137 138 139 … 210 Next »