Assignee Research: Index of Papers

[26]

GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned

27 May 2026. Score: 9.00/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20411590

Abstract: Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to the complex distributional shifts naturally occurring in the real world. Here, we develop GraphMETRO, a Graph…

[25]

Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning

27 May 2026. Score: 8.33/10. Verification: L2, Source-grounded claims.

Abstract: Mainstream parameter-efficient fine-tuning (PEFT) methods, such as LoRA or Adapter, project a model's hidden states to a lower dimension, allowing pre-trained models to adapt to new data through this low-rank bottleneck. However, PEFT tasks involving multiple modalities, like vision-language (VL) tasks, require not…

[24]

SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs

27 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20411378

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[23]

ExpertFlow: Efficient Mixture-of-Experts Inference via Predictive Expert Caching

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20411364

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[22]

SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[21]

SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs

27 May 2026. Score: 6.33/10. Verification: L2, Source-grounded claims.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[20]

ExpertFlow: Efficient Mixture-of-Experts Inference via Predictive Expert Caching

27 May 2026. Score: 2.67/10. Verification: L1, Literature synthesis.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[19]

SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[18]

BLURR: A Boosted Low-Resource Inference for Vision-Language-Action Models

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20410568

Abstract: Vision-language-action (VLA) models enable impressive zero shot manipulation, but their inference stacks are often too heavy for responsive web demos or high frequency robot control on commodity GPUs. We present BLURR, a lightweight inference wrapper that can be plugged into existing VLA controllers without…

[17]

MMBERT: Scaled Mixture-of-Experts Multimodal BERT for Robust Chinese Hate Speech

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims.

Abstract: Hate speech detection on Chinese social networks presents distinct challenges, particularly due to the widespread use of cloaking techniques designed to evade conventional text-based detection systems. Although large language models (LLMs) have recently improved hate speech detection capabilities, the majority of…

[16]

Scaling Laws for Native Multimodal Models

27 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20410359

Abstract: Building general-purpose models that can effectively perceive the world through multimodal signals has been a long-standing goal. Current approaches involve integrating separately pre-trained components, such as connecting vision encoders to LLMs and continuing multimodal training. While such approaches exhibit…

[15]

An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims.

Abstract: To produce a domain-agnostic question answering model for the Machine Reading Question Answering (MRQA) 2019 Shared Task, we investigate the relative benefits of large pre-trained language models, various data sampling strategies, as well as query and context paraphrases generated by back-translation. We find a…

[14]

Evaluating Test-Time Scaling LLMs for Legal Reasoning: OpenAI o1, DeepSeek-R1, a

27 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20409932

Abstract: Recent advances in test-time scaling of large language models (LLMs), exemplified by DeepSeek-R1 and OpenAI's o1, show that extending the chain of thought during inference can significantly improve general reasoning performance. However, the impact of this paradigm on legal reasoning remains insufficiently explored.…

[13]

S*: Test Time Scaling for Code Generation

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20409874

Abstract: Increasing test-time compute for LLMs shows promise across domains but remains underexplored in code generation, despite extensive study in math. In this paper, we propose S*, the first hybrid test-time scaling framework that substantially improves the coverage and selection accuracy of generated code. S* extends the…

[12]

Evaluating Multi-Hop Reasoning in RAG Systems: A Comparison of LLM-Based Retriev

27 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20409804

Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge to answer questions more accurately. However, research on evaluating RAG systems-particularly the retriever component-remains limited, as most existing work focuses on single-context retrieval rather than multi-hop…

[11]

Evaluating Multi-Hop Reasoning in RAG Systems: A Comparison of LLM-Based Retriev

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20409686

Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge to answer questions more accurately. However, research on evaluating RAG systems-particularly the retriever component-remains limited, as most existing work focuses on single-context retrieval rather than multi-hop…

[10]

Evaluating Multi-Hop Reasoning in RAG Systems: A Comparison of LLM-Based Retriev

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20409196

Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge to answer questions more accurately. However, research on evaluating RAG systems-particularly the retriever component-remains limited, as most existing work focuses on single-context retrieval rather than multi-hop…

[9]

LLM-as-a-Judge: Reassessing the Performance of LLMs in Extractive QA

27 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20408526

Abstract: Extractive reading comprehension question answering (QA) datasets are typically evaluated using Exact Match (EM) and F1-score, but these metrics often fail to fully capture model performance. With the success of large language models (LLMs), they have been employed in various tasks, including serving as judges…

[8]

Evaluating Multi-Hop Reasoning in RAG Systems: A Comparison of LLM-Based Retriev

27 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20408396

Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge to answer questions more accurately. However, research on evaluating RAG systems-particularly the retriever component-remains limited, as most existing work focuses on single-context retrieval rather than multi-hop…

[7]

Learning Sparse Mixture of Experts for Visual Question Answering

27 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims.

Abstract: There has been a rapid progress in the task of Visual Question Answering with improved model architectures. Unfortunately, these models are usually computationally intensive due to their sheer size which poses a serious challenge for deployment. We aim to tackle this issue for the specific task of Visual Question…

[6]

Cofca: A Step-Wise Counterfactual Multi-hop QA benchmark

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20408050

Abstract: While Large Language Models (LLMs) excel in question-answering (QA) tasks, their real reasoning abilities on multiple evidence retrieval and integration on Multi-hop QA tasks remain less explored. Firstly, LLMs sometimes generate answers that rely on internal memory rather than retrieving evidence and reasoning in…

[5]

AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixt

27 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20407901

Abstract: Multimodal Mixture-of-Experts (MoE) models offer a promising path toward scalable and efficient large vision-language systems. However, existing approaches rely on rigid routing strategies (typically activating a fixed number of experts per token) ignoring the inherent heterogeneity in semantic importance across…

[4]

Adapting Foundation Vision-Language Models to Medical Diagnosis via Query-Driven

27 May 2026. Score: 7.23/10. Verification: L2, Source-grounded claims.

Abstract: Vision-language foundation models achieve promising performance in natural image classification, yet their direct application to medical imaging is limited by severe domain shifts, resolution mismatches, and the multi-label nature of clinical diagnosis. Training dedicated medical foundation models from scratch,…

[3]

SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs

27 May 2026. Score: 7.40/10. Verification: L2, Source-grounded claims.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[2]

SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs

27 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20406928

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…