Assignee Research: Index of Papers

[201]

How does varying the number of experts in SMoES-based MoE-VLMs affect MMMU accuracy variance and expert routin

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[200]

How robust are SMoES-based 7B VLMs to distribution shifts in visual inputs on SEED-Bench compared to dense and

28 May 2026. Score: 3.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Vision-Language Models (VLMs) are increasingly used as perceptual modules for visual content reasoning, including through captioning and DeepFake detection. In this work, we expose a critical vulnerability of VLMs when exposed to subtle, structured perturbations in the frequency domain. Specifically, we highlight how…

[199]

What is the scaling behavior of SMoES-based VLMs from 1B to 13B parameters in terms of accuracy-latency Pareto

28 May 2026. Score: 4.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) architectures enable conditional computation by routing inputs to multiple expert subnetworks and are often motivated as a mechanism for scaling large language models. In this project, we instead study MoE behavior in an image classification setting, focusing on predictive performance, expert…

[198]

How does the inference throughput and memory efficiency of SMoES-based 7B VLMs compare against dense and hard-

28 May 2026. Score: 3.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[197]

Does SMoES routing with soft modality guidance generalize robustness to unseen adversarial perturbations on mu

28 May 2026. Score: 3.73/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Machine learning (ML) systems have introduced significant advances in various fields, due to the introduction of highly complex models. Despite their success, it has been shown multiple times that machine learning models are prone to imperceptible perturbations that can severely degrade their accuracy. So far,…

[196]

How does scaling top-k selection in SMoES affect inference throughput and MMMU accuracy trade-off under cross-

28 May 2026. Score: 4.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for expert selection remain poorly understood. In this work, we introduce routing signatures, a vector representation summarizing expert activation…

[195]

How does the expert specialization in SMoES-based MoE-VLMs affect cross-modal reasoning robustness on multimod

28 May 2026. Score: 4.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[194]

Does increasing the number of experts in SMoES improve robustness to cloaking-style adversarial perturbations

28 May 2026. Score: 3.40/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[193]

What is the scaling efficiency of SMoES-based MoE-VLMs in terms of downstream task accuracy per additional exp

28 May 2026. Score: 2.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[192]

Does predictive expert caching in SMoE multimodal models improve robustness to cross-modal distribution shifts

28 May 2026. Score: 1.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to the complex distributional shifts naturally occurring in the real world. Here, we develop GraphMETRO, a Graph…

[191]

What is the FLOPs efficiency (tokens per FLOP) of SMoE models versus dense models when evaluated on MMMU subse

28 May 2026. Score: 1.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: There has been a rapid progress in the task of Visual Question Answering with improved model architectures. Unfortunately, these models are usually computationally intensive due to their sheer size which poses a serious challenge for deployment. We aim to tackle this issue for the specific task of Visual Question…

[190]

How does the inference throughput (tokens/sec) of SMoE-based multimodal models compare to dense baselines on t

28 May 2026. Score: 1.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[189]

Does MoE-LLaVA's routing strategy improve cross-modal robustness to textual adversarial perturbations (e.g., s

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Following the success in advancing natural language processing and understanding, transformers are expected to bring revolutionary changes to computer vision. This work provides a comprehensive study on the robustness of vision transformers (ViTs) against adversarial perturbations. Tested on various white-box and…

[188]

What is the inference throughput (tokens per second) and memory footprint trade-off for MoE-LLaVA versus dense

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[187]

How does the robustness of SMoES-based MoE-VLMs with soft modality-guided routing compare to dense models of e

28 May 2026. Score: 3.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[186]

What is the inference throughput and memory efficiency trade-off of SMoES MoE-VLMs relative to dense VLMs of e

28 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20432521

Abstract: The field of natural language processing (NLP) has made significant strides in recent years, particularly in the development of large-scale vision-language models (VLMs). These models aim to bridge the gap between text and visual information, enabling a more comprehensive understanding of multimedia data. However, as…

[185]

How does the SMoES soft modality-guided routing mechanism compare to dense baselines and hard routing variants

28 May 2026. Score: 2.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[184]

What is the performance gap trend in visual reasoning accuracy between SMoES-based MoE-VLMs and dense models a

28 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20432411

Abstract: With the increasing data volume, there is a trend of using large-scale pre-trained models to store the knowledge into an enormous number of model parameters. The training of these models is composed of lots of dense algebras, requiring a huge amount of hardware resources. Recently, sparsely-gated Mixture-of-Experts…

[183]

How does the scaling behavior of inference throughput and reasoning accuracy differ between SMoES MoE-VLMs and

28 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20432231

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family…

[182]

How does the inference throughput and accuracy of SMoES-based MoE-VLMs with soft modality-guided routing compa

28 May 2026. Score: 7.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: The transformer architecture has become a cornerstone of modern AI, fueling remarkable progress across applications in natural language processing, computer vision, and multi-modal learning. As these models continue to scale explosively for performance, implementation efficiency remains a critical challenge.…

[181]

What is the accuracy-throughput Pareto frontier of SMoES MoE-VLMs versus dense models on cross-modal reasoning

28 May 2026. Score: 3.33/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Migrating computational intensive tasks from mobile devices to more resourceful cloud servers is a promising technique to increase the computational capacity of mobile devices while saving their battery energy. In this paper, we consider an MIMO multicell system where multiple mobile users (MUs) ask for computation…

[180]

What is the impact of data augmentation techniques on LLM generalization across domains, as measured by F1 sco

28 May 2026. Score: 7.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Abstract This paper critically examines model compression techniques within the machine learning (ML) domain, emphasizing their role in enhancing model efficiency for deployment in resource-constrained environments, such as mobile devices, edge computing, and Internet of Things (IoT) systems. By systematically…

[179]

How does the inference throughput (tokens per second) of SMoES-based MoE-VLMs compare to dense models of equal

28 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20432042

Abstract: In performing a Bayesian analysis of astronomical data, two difficult problems often emerge. First, in estimating the parameters of some model for the data, the resulting posterior distribution may be multimodal or exhibit pronounced (curving) degeneracies, which can cause problems for traditional MCMC sampling…

[178]

How do different sampling strategies affect the efficiency and accuracy of LLMs on domain-agnostic question an

28 May 2026. Score: 7.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional…

[177]

How does the effectiveness of negative sampling for unanswerable questions in the MRQA dataset compare to SQuA

28 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20432015

Abstract: In the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching…