Assignee Research: Index of Papers

[32]

Tree of Reviews: A Tree-based Dynamic Iterative Retrieval Framework for Multi-ho

27 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20412206

Abstract: Multi-hop question answering is a knowledge-intensive complex problem. Large Language Models (LLMs) use their Chain of Thoughts (CoT) capability to reason complex problems step by step, and retrieval-augmentation can effectively alleviate factual errors caused by outdated and unknown knowledge in LLMs. Recent works…

[31]

Learning From Failure: Integrating Negative Examples when Fine-tuning Large Lang

27 May 2026. Score: 8.00/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20412097

Abstract: Large language models (LLMs) have achieved success in acting as agents, which interact with environments through tools such as search engines. However, LLMs are optimized for language generation instead of tool use during training or alignment, limiting their effectiveness as agents. To resolve this problem, previous…

[30]

AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixt

27 May 2026. Score: 8.00/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20411949

Abstract: Multimodal Mixture-of-Experts (MoE) models offer a promising path toward scalable and efficient large vision-language systems. However, existing approaches rely on rigid routing strategies (typically activating a fixed number of experts per token) ignoring the inherent heterogeneity in semantic importance across…

[29]

Mixture-of-Experts Models in Vision: Routing, Optimization, and Generalization

27 May 2026. Score: 8.67/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20411788

Abstract: Mixture-of-Experts (MoE) architectures enable conditional computation by routing inputs to multiple expert subnetworks and are often motivated as a mechanism for scaling large language models. In this project, we instead study MoE behavior in an image classification setting, focusing on predictive performance, expert…

[28]

NOVA: A Benchmark for Anomaly Localization and Clinical Reasoning in Brain MRI

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20411786

Abstract: In many real-world applications, deployed models encounter inputs that differ from the data seen during training. Out-of-distribution detection identifies whether an input stems from an unseen distribution, while open-world recognition flags such inputs to ensure the system remains robust as ever-emerging, previously…

[27]

CAT: Content-Adaptive Image Tokenization

27 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20411770

Abstract: Most existing image tokenizers encode images into a fixed number of tokens or patches, overlooking the inherent variability in image complexity. To address this, we introduce Content-Adaptive Tokenizer (CAT), which dynamically adjusts representation capacity based on the image content and encodes simpler images into…

[26]

GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned

27 May 2026. Score: 9.00/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20411590

Abstract: Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to the complex distributional shifts naturally occurring in the real world. Here, we develop GraphMETRO, a Graph…

[25]

Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning

27 May 2026. Score: 8.33/10. Verification: L2, Source-grounded claims.

Abstract: Mainstream parameter-efficient fine-tuning (PEFT) methods, such as LoRA or Adapter, project a model's hidden states to a lower dimension, allowing pre-trained models to adapt to new data through this low-rank bottleneck. However, PEFT tasks involving multiple modalities, like vision-language (VL) tasks, require not…

[24]

SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs

27 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20411378

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[23]

ExpertFlow: Efficient Mixture-of-Experts Inference via Predictive Expert Caching

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20411364

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[22]

SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[21]

SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs

27 May 2026. Score: 6.33/10. Verification: L2, Source-grounded claims.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[20]

ExpertFlow: Efficient Mixture-of-Experts Inference via Predictive Expert Caching

27 May 2026. Score: 2.67/10. Verification: L1, Literature synthesis.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[19]

SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[18]

BLURR: A Boosted Low-Resource Inference for Vision-Language-Action Models

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20410568

Abstract: Vision-language-action (VLA) models enable impressive zero shot manipulation, but their inference stacks are often too heavy for responsive web demos or high frequency robot control on commodity GPUs. We present BLURR, a lightweight inference wrapper that can be plugged into existing VLA controllers without…

[17]

MMBERT: Scaled Mixture-of-Experts Multimodal BERT for Robust Chinese Hate Speech

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims.

Abstract: Hate speech detection on Chinese social networks presents distinct challenges, particularly due to the widespread use of cloaking techniques designed to evade conventional text-based detection systems. Although large language models (LLMs) have recently improved hate speech detection capabilities, the majority of…

[16]

Scaling Laws for Native Multimodal Models

27 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20410359

Abstract: Building general-purpose models that can effectively perceive the world through multimodal signals has been a long-standing goal. Current approaches involve integrating separately pre-trained components, such as connecting vision encoders to LLMs and continuing multimodal training. While such approaches exhibit…

[15]

An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims.

Abstract: To produce a domain-agnostic question answering model for the Machine Reading Question Answering (MRQA) 2019 Shared Task, we investigate the relative benefits of large pre-trained language models, various data sampling strategies, as well as query and context paraphrases generated by back-translation. We find a…

[14]

Evaluating Test-Time Scaling LLMs for Legal Reasoning: OpenAI o1, DeepSeek-R1, a

27 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20409932

Abstract: Recent advances in test-time scaling of large language models (LLMs), exemplified by DeepSeek-R1 and OpenAI's o1, show that extending the chain of thought during inference can significantly improve general reasoning performance. However, the impact of this paradigm on legal reasoning remains insufficiently explored.…

[13]

S*: Test Time Scaling for Code Generation

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20409874

Abstract: Increasing test-time compute for LLMs shows promise across domains but remains underexplored in code generation, despite extensive study in math. In this paper, we propose S*, the first hybrid test-time scaling framework that substantially improves the coverage and selection accuracy of generated code. S* extends the…

[12]

Evaluating Multi-Hop Reasoning in RAG Systems: A Comparison of LLM-Based Retriev

27 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20409804

Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge to answer questions more accurately. However, research on evaluating RAG systems-particularly the retriever component-remains limited, as most existing work focuses on single-context retrieval rather than multi-hop…

[11]

Evaluating Multi-Hop Reasoning in RAG Systems: A Comparison of LLM-Based Retriev

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20409686

Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge to answer questions more accurately. However, research on evaluating RAG systems-particularly the retriever component-remains limited, as most existing work focuses on single-context retrieval rather than multi-hop…

[10]

Evaluating Multi-Hop Reasoning in RAG Systems: A Comparison of LLM-Based Retriev

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20409196

Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge to answer questions more accurately. However, research on evaluating RAG systems-particularly the retriever component-remains limited, as most existing work focuses on single-context retrieval rather than multi-hop…

[9]

LLM-as-a-Judge: Reassessing the Performance of LLMs in Extractive QA

27 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20408526

Abstract: Extractive reading comprehension question answering (QA) datasets are typically evaluated using Exact Match (EM) and F1-score, but these metrics often fail to fully capture model performance. With the success of large language models (LLMs), they have been employed in various tasks, including serving as judges…

[8]

Evaluating Multi-Hop Reasoning in RAG Systems: A Comparison of LLM-Based Retriev

27 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20408396

Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge to answer questions more accurately. However, research on evaluating RAG systems-particularly the retriever component-remains limited, as most existing work focuses on single-context retrieval rather than multi-hop…