Assignee Research: Index of Papers

[446]

How does the sample efficiency of multi-turn RL for long-horizon VLN-CE tasks compare to imitation learning ba

29 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20439535

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[445]

Does layer-wise score aggregation improve SuperGLUE task accuracy over last-layer baselines when evaluated on

29 May 2026. Score: 9.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20439531

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate…

[444]

What is the impact of routing strategy choice on the throughput and accuracy trade-off when scaling sparse MoE

29 May 2026. Score: 9.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20439529

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[443]

How does COCO-DR's zero-shot recall@5 on NQ and TriviaQA compare to supervised dense retrievers like DPR and C

29 May 2026. Score: 8.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20439523

Abstract: Effective information retrieval (IR) from vast datasets relies on advanced techniques to extract relevant information in response to queries.Recent advancements in dense retrieval have showcased remarkable efficacy compared to traditional sparse retrieval methods.To further enhance retrieval performance, knowledge…

[442]

Can AlphaX framework be adapted to improve sample efficiency in neural architecture search by incorporating un

29 May 2026. Score: 8.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20439521

Abstract: Neural Architecture Search (NAS) has shown great success in automating the design of neural networks, but the prohibitive amount of computations behind current NAS methods requires further investigations in improving the sample efficiency and the network evaluation cost to get better results in a shorter time. In…

[441]

Can AlphaX framework be adapted to improve sample efficiency in neural architecture search by incorporating un

29 May 2026. Score: 6.73/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Deep convolutional neural networks have performed remarkably well on many Computer Vision tasks. However, these networks are heavily reliant on big data to avoid overfitting. Overfitting refers to the phenomenon when a network learns a function with very high variance such as to perfectly model the training data.…

[440]

How does the choice of LoRA rank in cross-attention layers influence the trade-off between FVD and LPIPS score

29 May 2026. Score: 7.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: We present W.A.L.T, a transformer-based approach for photorealistic video generation via diffusion modeling. Our approach has two key design decisions. First, we use a causal encoder to jointly compress images and videos within a unified latent space, enabling training and generation across modalities. Second, for…

[439]

What is the impact of model size reduction techniques on the effectiveness of music information retrieval syst

29 May 2026. Score: 7.23/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: A large amount of information exists in reviews written by users. This source of information has been ignored by most of the current recommender systems while it can potentially alleviate the sparsity problem and improve the quality of recommendations. In this paper, we present a deep model to learn item properties…

[438]

What is the impact of cross-view structural consistency modeling on the performance of UNAGI for multi-view cl

29 May 2026. Score: 9.23/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20439491

Abstract: In this paper we report the set-up and results of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 conferences. Twenty state-of-the-art tumor segmentation algorithms were applied to a set of 65 multi-contrast MR scans of low- and high-grade glioma…

[437]

What is the impact of quantized inference (4-bit vs 8-bit) on the throughput and task-specific F1 scores of co

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20439445

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[436]

How does the auxiliary-loss-free load balancing strategy in DeepSeek-V3 affect expert utilization diversity an

29 May 2026. Score: 7.20/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in…

[435]

How does the accuracy of Gemini 1.5 Pro on the MMMU benchmark compare to MoE-LLaVA and dense LLaVA-1.5 when ev

29 May 2026. Score: 7.87/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20439435

Abstract: Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting…

[434]

What is the inference throughput (tokens per second) of Gemini 1.5 Flash versus LLaVA-NeXT on the Video-MME be

29 May 2026. Score: 8.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20439420

Abstract: Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine learning lifecycle, from training to deployment. However, the substantial advancements in versatility and performance these models offer come…

[433]

Can a mixture-of-experts (MoE) routing strategy with adaptive sparsity improve both throughput (FPS) and per-c

29 May 2026. Score: 3.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This review comprehensively investigates the current state and emerging trends of autonomous vehicle terrain detection and segmentation. By systematically reviewing literature from various databases, this study outlines the evolution of detection and segmentation techniques from traditional computer vision methods to…

[432]

What is the comparative performance drop in Video-MME accuracy for MoE models (e.g., Mixtral 8x22B) versus den

29 May 2026. Score: 1.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69\% on MMLU and 8.38 on MT-bench), despite being…

[431]

What is the trade-off between mIoU and latency for DDRNet23-slim versus DeepLabV3+ on the RUGD dataset under v

29 May 2026. Score: 6.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: ABSTRACT Offroad autonomous vehicles (OAVs) are becoming increasingly popular for navigating challenging environments in agriculture, military, and exploration applications. These vehicles face unique challenges, such as unpredictable terrain, dynamic obstacles, and varying environmental conditions. Therefore, it is…

[430]

How does the choice between nucleus sampling and temperature scaling affect pass@k scores and character-level

29 May 2026. Score: 6.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[429]

What is the trade-off between token sparsity levels (e.g., 10\%, 30\%, 50\%) in masked autoencoders and segmentat

29 May 2026. Score: 6.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: In recent years unmanned aerial vehicles (UAVs) have emerged as a popular and cost-effective technology to capture high spatial and temporal resolution remote sensing (RS) images for a wide range of precision agriculture applications, which can help reduce costs and environmental impacts by providing detailed…

[428]

What is the impact of domain-specific data augmentation on cross-modal alignment scores in multimodal LLMs eva

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20439332

Abstract: Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This…

[427]

Can data augmentation techniques improve few-shot learning performance of vision-language models on standard b

29 May 2026. Score: 5.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: In the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching…

[426]

How does the choice of attention mechanism (e.g., sparse vs. dense) in vision transformers affect mean Interse

29 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20439318

Abstract: Since the introduction of Vision Transformers, the landscape of many computer vision tasks (e.g., semantic segmentation), which has been overwhelmingly dominated by CNNs, recently has significantly revolutionized. However, the computational cost and memory requirement renders these methods unsuitable on the mobile…

[425]

What is the optimal negative sampling ratio for domain-agnostic QA performance across different model scales

29 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20439310

Abstract: Systematic reviews and meta-analyses are essential to summarize evidence relating to efficacy and safety of health care interventions accurately and reliably. The clarity and transparency of these reports, however, is not optimal. Poor reporting of systematic reviews diminishes their value to clinicians, policy…

[424]

To what extent does incorporating unanswerable questions through negative sampling techniques improve performa

29 May 2026. Score: 7.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: We present the results of the Machine Reading for Question Answering (MRQA) 2019 shared task on evaluating the generalization capabilities of reading comprehension systems. 1 In this task, we adapted and unified 18 distinct question answering datasets into the same format. Among them, six datasets were made available…

[423]

To what extent does the integration of secure multi-party computation protocols affect the zero-shot text clas

29 May 2026. Score: 4.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Transformer models (e.g., Bert and GPT) have shown their dominance in machine learning tasks. Many cloud companies have begun to provide services based on Transformer models, examples include translation and text-speech conversion. However, such a service inevitably requires access to the client's data, which might…

[422]

What is the impact of privacy-preserving representations on inference latency when scaling model sizes from 7B

29 May 2026. Score: 6.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…