Index |  Research ▾  |  Verification ▾  | About
Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 8265 papers; mean review score 5.72/10; 2247 Zenodo DOIs. Verified contributions (Gate 2: formal proof or sandbox reproduction): 148. 78 claims falsified by the pipeline (see falsification record). 169 published AI claims under field audit; 92 contested by the literature itself (see audit ledger). 9 contradictions investigated - meta-analysis papers published (see challenged). What does this mean?
Results 7251–7275 of 8265 entries

Papers

[1015]
30 May 2026. Score: 5.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How robust is the Tree of Reviews framework to noisy or adversarially perturbed documents in multi-hop QA compared to flat retrieval methods, measured by F1 score on TriviaQA. Multi-hop question answering is a…

[1014]
30 May 2026. Score: 6.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What is the computational overhead of the Tree of Reviews framework relative to linear chain retrieval methods when scaled to 1000+ context documents on the WebQuestionsSP benchmark. Multi-hop question answering…

[1013]
30 May 2026. Score: 6.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the robustness of Oracle-RLAIF compare to reinforcement learning from human feedback (RLHF) when evaluated on the DiDeMo benchmark's retrieval accuracy under noisy or adversarial input. Spoken query…

[1012]
30 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does retrieval depth in RAG-augmented Llama-3-8B impact answer accuracy on multi-track comparative music QA benchmarks versus single-track datasets. Recent advancements in Large language models (LLMs) have…

[1011]
30 May 2026. Score: 3.00/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the performance of cross-lingual alignment methods like Oracle-RLAIF on the RxR-CE benchmark compare to standard SFT baselines across low-resource language subsets. Reinforcement Learning from Human…

[1010]
30 May 2026. Score: 1.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the performance gap between Oracle-RLAIF and traditional supervised fine-tuning (SFT) when evaluated on the downstream task of video captioning using the MSVD benchmark's CIDEr score under. Recent…

[1009]
30 May 2026. Score: 3.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the inference efficiency (throughput, latency) of Llama-3-8B-128K compare to Mistral-7B and Falcon-40B when deployed in a real-time software engineering evaluation pipeline. We introduce Mistral 7B v0.1,…

[1008]
30 May 2026. Score: 3.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the efficiency tradeoff between 7B and 13B InternVL models in terms of inference latency and memory usage when deployed on edge devices with quantized weights. Quantized neural networks are well known for…

[1007]
30 May 2026. Score: 5.10/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the sample efficiency and convergence speed of reinforcement learning-based VLN models trained on RxR compare to those trained on R2R when scaling instruction complexity and language. This report…

[1006]
30 May 2026. Score: 1.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the correlation between prompt length and task completion accuracy for Embodied-R1 compared to smaller VLA models in embodied navigation tasks. This paper develops LongNav-R1, an end-to-end multi-turn…

[1005]
30 May 2026. Score: 5.27/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the task completion accuracy of small-scale 3B multimodal policies scale relative to 7B and 13B models when faced with increasing instruction complexity in embodied navigation environments. This paper…

[1004]
30 May 2026. Score: 3.00/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the effect of Constitutional AI training on the pass@1 scores of sparse MoE models when evaluated against adversarial prompts in code synthesis tasks. Mixture-of-Experts (MoE) networks promise favorable…

[1003]
30 May 2026. Score: 3.33/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: Do 13B VLA models with multimodal pretraining demonstrate better zero-shot reasoning capabilities on the MM-ReAct benchmark compared to 7B models when evaluated using Exact Match accuracy. Web-crawled pretraining…

[1002]
30 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does RLHF alignment impact the adversarial robustness of sparse MoE models on code generation benchmarks like HumanEval Pro compared to dense architectures. We introduce self-invoking code generation, a new…

[1001]
30 May 2026. Score: 5.53/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: To what extent does adversarial robustness training (e.g., R-LPIPS) improve the cross-domain generalization of Wan2.1 I2V-14B when fine-tuned with LoRA on unseen video synthesis benchmarks beyond. We present a…

[1000]
30 May 2026. Score: 6.70/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How does the LoRA rank selection (e.g., 4, 8, 16) in Wan2.1 I2V-14B affect its inference efficiency (latency, throughput) on human video synthesis tasks while maintaining comparable FVD and LPIPS. Abstract Deep…

[999]
30 May 2026. Score: 5.27/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the choice of LoRA rank in Wan2.1 I2V-14B influence its ability to preserve temporal consistency (measured via FVD) versus perceptual quality (measured via R-LPIPS) in long-form human video. We present a…

[998]
30 May 2026. Score: 7.30/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the scaling of LoRA rank in multimodal diffusion transformers affect memory footprint and generation speed relative to full parameter fine-tuning on downstream video tasks. Large models represent a…

[997]
30 May 2026. Score: 7.60/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20467928

Abstract: This report synthesises findings from 3 peer-reviewed papers addressing the following research question: What is the trade-off between inference latency and video quality metrics (e.g., FVD, CLIP score) when applying low-rank adaptation to the Wan2.1 14B model for edge deployment. The identification of genetically…

[996]
30 May 2026. Score: 6.70/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the comparative performance of joint latent space compression versus specialized video latent models on text-to-video generation accuracy measured by CLIP score and motion consistency metrics. Abstract…

[995]
30 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20467924

Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the integration of W.A.L.T's causal encoder design with Flamingo's visual tokenizer impact inference latency and downstream video captioning performance on ActivityNet when compared to. Video description…

[994]
30 May 2026. Score: 7.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the quantitative trade-off between NDCG@10 recommendation accuracy and RLHF alignment scores when jointly modeling short-term and long-term user preferences using instruction-tuned LLMs. Abstract The…

[993]
30 May 2026. Score: 5.93/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: To what extent does scaling the number of Indonesian video-text training samples in MSVD-Indonesian affect the zero-shot cross-lingual transfer performance of Flamingo on non-Indonesian video. Multimodal learning…

[992]
30 May 2026. Score: 7.20/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What metrics (e.g., BLEU, CIDEr, METEOR) demonstrate the robustness of Indonesian video-text models like MSVD-Indonesian when fine-tuned with PaLI versus Flamingo on MSRVTT, and how does this compare. While…

[991]
30 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20467889

Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the degradation in out-of-distribution robustness for video encoders pretrained on synthetic datasets when evaluated on diverse human motion benchmarks. Deep convolutional neural networks have performed…

« Prev 1 289 290 291 292 293 331 Next »