Assignee Research: Index of Papers

Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 5125 papers; mean review score 5.71/10; 1466 Zenodo DOIs.

Results 3501–3525 of 5125 entries

Papers

[1625]

Qwen2.5 Performance on HumanEval Pro and MBPP Pro Across Finetuning Dataset Sizes

31 May 2026. Score: 5.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the finetuning dataset size impact the performance of Qwen2.5 on the HumanEval Pro and MBPP Pro benchmarks compared to models with smaller pretraining datasets. We introduce self-invoking code…

[1624]

Sliding Window vs. Full Attention in Long-Sequence Code Generation Accuracy

31 May 2026. Score: 3.67/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What is the impact of sliding window attention on code generation accuracy compared to full attention mechanisms in long-sequence programming benchmarks. GitHub Copilot, an extension for the Visual Studio Code…

[1623]

RxR and R2R Agent Performance on ALFRED Long-Horizon Navigation Tasks

31 May 2026. Score: 6.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the performance of RxR-trained agents compare to those trained on Room-to-Room (R2R) when evaluated on the ALFRED benchmark for long-horizon language-grounded navigation tasks. Existing Vision-Language…

[1622]

Mixed-Precision Quantization Trade-offs in Multimodal Models: Efficiency vs. Reasoning Accuracy

31 May 2026. Score: 5.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: What is the trade-off between inference efficiency (latency/throughput) and reasoning accuracy when applying mixed-precision quantization to multimodal models like InternLM on benchmarks such as MMMU.…

[1621]

Sliding Window Attention Impact on Long-Context LLM Inference Efficiency

31 May 2026. Score: 6.83/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does sliding window attention affect inference latency and memory usage when processing context lengths exceeding 32K tokens in LLM reasoning tasks. The quadratic compute and memory costs of global…

[1620]

Multilingual VLN Agents on RxR and R2R: Cross-Lingual Transfer Learning Performance

31 May 2026. Score: 6.50/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: Do multilingual VLN agents trained on RxR demonstrate improved cross-lingual transfer learning capabilities when evaluated on the Room-to-Region (R2R) dataset for English and non-English instructions.…

[1619]

Mixed-Precision Quantization Effects on InternLM Multimodal Reasoning Performance

31 May 2026. Score: 3.07/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does mixed-precision quantization (e.g., 4-bit vs. 8-bit) affect the performance of quantized InternLM models on multimodal reasoning benchmarks like MMBench and ITP compared to the LLaVA. Reducing the…

[1618]

Quantization-Aware Training and Post-Training Quantization Effects on LLM Mathematical Reasoning

31 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does quantization-aware training (QAT) impact the reasoning capabilities of large language models (LLMs) on mathematical benchmarks compared to post-training quantization (PTQ) when evaluated on. Post-training…

[1617]

Dynamic Expert Sharing in Sparse MoE Models for Efficient Code Generation

31 May 2026. Score: 4.40/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: Does increasing the number of experts in sparse MoE models improve inference efficiency (throughput) while maintaining pass@1 accuracy on self-invoking code generation tasks as benchmarked on. Among parallel…

[1616]

Sparse MoE and Dense Transformer Performance Gaps in Self-Invoking Code Generation

31 May 2026. Score: 5.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How does the performance gap between sparse MoE models and dense transformers on self-invoking code generation tasks vary when evaluated on MBPP Pro compared to HumanEval Pro. We introduce self-invoking code…

[1615]

RxR-Trained Agents with Tryout Controller in Unseen Complex Environments

31 May 2026. Score: 3.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does the path efficiency of RxR-trained agents with the tryout controller scale with increasing complexity of unseen environments (e.g., larger maps, more obstacles) compared to R2R-trained agents. Eccentric…

[1614]

RLHF Alignment Effects on Multimodal Model Accuracy in Self-Invoking Code Generation

31 May 2026. Score: 4.73/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the impact of RLHF alignment on the pass@1 accuracy of multimodal models (e.g., text-to-code) compared to text-only models in solving self-invoking code generation tasks on HumanEval Pro. We introduce…

[1613]

RxR-Trained Agents with Tryout Controller Outperform Benchmarks in Path Efficiency

31 May 2026. Score: 3.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the path efficiency of RxR-trained agents with the tryout controller compare to agents trained with other navigation benchmarks (e.g., ALFRED, Room-Across-Room) when evaluated on unseen. We introduce…

[1612]

Language Model Backbone Size Effects on RxR Agent Path Efficiency and Communication Success

31 May 2026. Score: 4.67/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the impact of varying the size of the language model backbone on the path efficiency and communication success rate of RxR-trained agents in the R2R benchmark. Large language models (LLMs) have achieved…

[1611]

Wan2.1 I2V-14B with LoRA Adaptation Performance on Out-of-Domain Cinematic Scenes

31 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does Wan2.1 I2V-14B with LoRA adaptation perform on out-of-domain cinematic scenes (e.g., sci-fi) compared to its performance on historical scenes, as evaluated by CLIP-based metrics like FID. We present a…

[1610]

RxR and R2R Trained Agents on ALFRED: A Comparative Performance Analysis

31 May 2026. Score: 3.77/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the relative performance of RxR-trained agents versus R2R-trained agents on the ALFRED benchmark for task and language grounding in realistic indoor environments. We introduce Room-Across-Room (RxR), a…

[1609]

LoRA Rank Dimension Effects on Temporal Consistency in Wan2.1 I2V-14B Video Generation

31 May 2026. Score: 6.00/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the impact of varying LoRA rank dimensions (e.g., 4, 8, 16) on the temporal consistency scores of Wan2.1 I2V-14B as measured by the FVD (Frechet Video Distance) benchmark. We present a practical pipeline…

[1608]

Multi-Agent Deep Reinforcement Learning Communication Efficiency on SCAN Benchmark Tasks

31 May 2026. Score: 3.17/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does the communication efficiency of MADRL agents scale with the number of agents when evaluated on the SCAN benchmark for natural language grounding tasks. Communication is an effective mechanism for…

[1607]

Tryout Controller Enhances Robustness in RxR-Trained Agents on R2R Benchmark

31 May 2026. Score: 5.50/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: Does the tryout controller mechanism in RxR-trained agents improve robustness to ambiguous natural language instructions when evaluated on the Room-to-Room (R2R) benchmark with a focus on instruction. This report…

[1606]

LoRA Rank Optimization in Wan2.1 I2V-14B and Its Effects on Video Generation Metrics

31 May 2026. Score: 4.60/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does adjusting the LoRA rank in Wan2.1 I2V-14B impact the FVD (Frechet Video Distance) and KID (Kernel Inception Distance) scores on benchmarks like UCF-101 or Kinetics-400 compared to full. We present a…

[1605]

LoRA Trade-offs in Video Diffusion Models: Latency and Temporal Consistency Across Hardware

31 May 2026. Score: 6.60/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the trade-off between inference latency and temporal consistency (measured by TSSIM or LPRO) when applying LoRA to video diffusion models like Make-A-Video or AnimateDiff across different. We present a…

[1604]

Parameter-Efficient Fine-Tuning and Temporal Stability in Text-to-Video Generation Models

31 May 2026. Score: 4.83/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: Do parameter-efficient fine-tuning methods like LoRA in text-to-video generation models achieve comparable temporal stability metrics (e.g., FVD-128, FID-128) to full fine-tuning when evaluated on. We present a…

[1603]

Adversarial Training Methods and Trade-offs in Diffusion Model Generation Quality and Sampling Efficiency

31 May 2026. Score: 3.00/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How do different adversarial training methods affect the trade-off between generation quality and sampling efficiency in large-scale diffusion model deployment. Predicting the trajectories of surrounding objects…

[1602]

Low-Rank Adapter Tuning in LoRA for Cross-Domain Video Generation Performance

31 May 2026. Score: 6.60/10. Verification: L2, Source-grounded claims.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the impact of reducing adapter rank in LoRA on the MotionScore benchmark for video generation tasks, particularly when evaluated on cross-domain generalization (e.g., historical vs. sci-fi. We present a…

[1601]

Directional Preference Alignment and RLHF in Sequential Recommendation Diversity Metrics

31 May 2026. Score: 3.33/10. Verification: L1, Literature synthesis.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does the Directional Preference Alignment (DPA) framework compare to traditional RLHF in terms of recommendation diversity metrics (e.g., coverage, novelty) on sequential recommendation. Recent studies have…

« Prev 1 … 139 140 141 142 143 … 205 Next »