Assignee Research: Index of Papers

[212]

How does SMoES's inference throughput (tokens/sec) on Winoground compare to modality-agnostic MoE-VLMs when sc

28 May 2026. Score: 1.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[211]

To what extent does SMoES improve expert specialization for cross-modal alignment tasks (e.g., visual question

28 May 2026. Score: 4.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: There has been a rapid progress in the task of Visual Question Answering with improved model architectures. Unfortunately, these models are usually computationally intensive due to their sheer size which poses a serious challenge for deployment. We aim to tackle this issue for the specific task of Visual Question…

[210]

What is the robustness of SMoES to distribution shifts in multimodal inputs (e.g., adversarial image perturbat

28 May 2026. Score: 6.90/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Pre-trained vision-language (VL) models are highly vulnerable to adversarial attacks. However, existing defense methods primarily focus on image classification, overlooking two key aspects of VL tasks: multimodal attacks, where both image and text can be perturbed, and the one-to-many relationship of images and…

[209]

How does the scaling efficiency of soft modality-guided routing in SMoES compare to dense and hard MoE baselin

28 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20433690

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[208]

To what extent does modality imbalance affect the accuracy and routing stability of multimodal language models

28 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20433683

Abstract: The rise of Multimodal Large Language Models (MLLMs) has significantly advanced the capabilities of AI systems to understand and generate content across diverse modalities such as text, images, audio, video, and sensory data. By leveraging the reasoning prowess of Large Language Models (LLMs), MLLMs unify multiple…

[207]

How does SMoES robustness to cross-domain generalization compare to hard-routing MoE baselines on MMBench and

28 May 2026. Score: 7.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: The increasing availability of biomedical data from large biobanks, electronic health records, medical imaging, wearable and ambient biosensors, and the lower cost of genome and microbiome sequencing have set the stage for the development of multimodal artificial intelligence solutions that capture the complexity of…

[206]

What is the robustness of SMoES-based MoE-VLMs to distribution shift and adversarial inputs compared to dense

28 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate…

[205]

What is the inference efficiency tradeoff between SMoES and hard-routing MoE approaches when evaluated on lang

28 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20433629

Abstract: Abstract Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of models typically rely on automated evaluations based on limited benchmarks. Here, to address these limitations, we present MultiMedQA, a…

[204]

How does the scaling behavior of SMoES-based MoE-VLMs compare to dense VLMs in terms of accuracy vs. total mod

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[203]

What is the inference throughput trade-off (tokens per second) and MMMU accuracy of SMoES-based MoE-VLMs as ex

28 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20433593

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[202]

Does SMoES's soft modality-guided routing improve MMMU cross-domain robustness (e.g., STEM vs. humanities) and

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[201]

How does varying the number of experts in SMoES-based MoE-VLMs affect MMMU accuracy variance and expert routin

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[200]

How robust are SMoES-based 7B VLMs to distribution shifts in visual inputs on SEED-Bench compared to dense and

28 May 2026. Score: 3.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Vision-Language Models (VLMs) are increasingly used as perceptual modules for visual content reasoning, including through captioning and DeepFake detection. In this work, we expose a critical vulnerability of VLMs when exposed to subtle, structured perturbations in the frequency domain. Specifically, we highlight how…

[199]

What is the scaling behavior of SMoES-based VLMs from 1B to 13B parameters in terms of accuracy-latency Pareto

28 May 2026. Score: 4.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) architectures enable conditional computation by routing inputs to multiple expert subnetworks and are often motivated as a mechanism for scaling large language models. In this project, we instead study MoE behavior in an image classification setting, focusing on predictive performance, expert…

[198]

How does the inference throughput and memory efficiency of SMoES-based 7B VLMs compare against dense and hard-

28 May 2026. Score: 3.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[197]

Does SMoES routing with soft modality guidance generalize robustness to unseen adversarial perturbations on mu

28 May 2026. Score: 3.73/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Machine learning (ML) systems have introduced significant advances in various fields, due to the introduction of highly complex models. Despite their success, it has been shown multiple times that machine learning models are prone to imperceptible perturbations that can severely degrade their accuracy. So far,…

[196]

How does scaling top-k selection in SMoES affect inference throughput and MMMU accuracy trade-off under cross-

28 May 2026. Score: 4.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for expert selection remain poorly understood. In this work, we introduce routing signatures, a vector representation summarizing expert activation…

[195]

How does the expert specialization in SMoES-based MoE-VLMs affect cross-modal reasoning robustness on multimod

28 May 2026. Score: 4.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[194]

Does increasing the number of experts in SMoES improve robustness to cloaking-style adversarial perturbations

28 May 2026. Score: 3.40/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[193]

What is the scaling efficiency of SMoES-based MoE-VLMs in terms of downstream task accuracy per additional exp

28 May 2026. Score: 2.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[192]

Does predictive expert caching in SMoE multimodal models improve robustness to cross-modal distribution shifts

28 May 2026. Score: 1.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to the complex distributional shifts naturally occurring in the real world. Here, we develop GraphMETRO, a Graph…

[191]

What is the FLOPs efficiency (tokens per FLOP) of SMoE models versus dense models when evaluated on MMMU subse

28 May 2026. Score: 1.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: There has been a rapid progress in the task of Visual Question Answering with improved model architectures. Unfortunately, these models are usually computationally intensive due to their sheer size which poses a serious challenge for deployment. We aim to tackle this issue for the specific task of Visual Question…

[190]

How does the inference throughput (tokens/sec) of SMoE-based multimodal models compare to dense baselines on t

28 May 2026. Score: 1.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[189]

Does MoE-LLaVA's routing strategy improve cross-modal robustness to textual adversarial perturbations (e.g., s

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Following the success in advancing natural language processing and understanding, transformers are expected to bring revolutionary changes to computer vision. This work provides a comprehensive study on the robustness of vision transformers (ViTs) against adversarial perturbations. Tested on various white-box and…

[188]

What is the inference throughput (tokens per second) and memory footprint trade-off for MoE-LLaVA versus dense

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…