Assignee Research: Index of Papers

[259]

How does dynamic token pruning in audio Transformers affect FLOPs efficiency and word error rate on the LibriS

28 May 2026. Score: 2.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Vision Transformers (ViTs) have achieved state-of-the-art performance across various computer vision tasks, but their high computational cost remains a challenge. Token pruning has been proposed to reduce this cost by selectively removing less important tokens. While effective in vision tasks by discarding non-object…

[258]

What is the throughput and inference efficiency trade-off of GraphMETRO's node-based alignment mechanism versu

28 May 2026. Score: 0.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: The discovery of deep, steerable taxonomies in large text corpora is currently restricted by a trade-off between the surface-level efficiency of topic models and the prohibitive, non-scalable assignment costs of LLM-integrated frameworks. We introduce textbf\LogiPart\, a scalable, hypothesis-first framework for…

[257]

How does node-based Bayesian neural network alignment compare to weight-based BNNs in reducing accuracy degrad

28 May 2026. Score: 4.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Bayesian neural networks (BNNs) promise improved generalization under covariate shift by providing principled probabilistic representations of epistemic uncertainty. However, weight-based BNNs often struggle with high computational complexity of large-scale architectures and datasets. Node-based BNNs have recently…

[256]

To what extent does token scheduling in sparse MoE inference improve robustness to distribution shift in docum

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[255]

Does GraphMETRO's expert diversity improve cross-domain generalization to out-of-distribution GQA splits compa

28 May 2026. Score: 2.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to the complex distributional shifts naturally occurring in the real world. Here, we develop GraphMETRO, a Graph…

[254]

How does dynamic batch-aware expert selection in MoE inference affect token-per-second throughput compared to

28 May 2026. Score: 2.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Selective parameter activation provided by Mixture-of-Expert (MoE) models have made them a popular choice in modern foundational models. However, MoEs face a fundamental tension when employed for serving. Batching, critical for performance in serving, forces the activation of all experts, thereby negating MoEs'…

[253]

Does the Lynx token scheduling approach generalize to other sparse MoE architectures (e.g., Mixtral 8x7B, Deep

28 May 2026. Score: 3.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Recent large language models such as Gemini-1.5, DeepSeek-V3, and Llama-4 increasingly adopt Mixture-of-Experts (MoE) architectures, which offer strong efficiency-performance trade-offs by activating only a fraction of the model per token. Yet academic researchers still lack a fully open, end-to-end MoE platform for…

[252]

What is the inference throughput cost of GraphMETRO's Mixture-of-Experts gating mechanism on VQAv2 relative to

28 May 2026. Score: 2.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to the complex distributional shifts naturally occurring in the real world. Here, we develop GraphMETRO, a Graph…

[251]

To what extent does the expandable side-MoE architecture improve robustness to distribution shift in user-item

28 May 2026. Score: 2.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Streaming recommender systems (SRSs) are widely deployed in real-world applications, where user interests shift and new items arrive over time. As a result, effectively capturing users' latest preferences is challenging, as interactions reflecting recent interests are limited and new items often lack sufficient…

[250]

How does the number of expert modules in GraphMETRO affect VQAv2 and GQA accuracy under natural distribution s

28 May 2026. Score: 1.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to the complex distributional shifts naturally occurring in the real world. Here, we develop GraphMETRO, a Graph…

[249]

How does the inference throughput of MoE-based multimodal streaming recommenders compare to dense baselines (e

28 May 2026. Score: 6.30/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[248]

What is the accuracy impact (Recall@k, NDCG@k) of replacing pretrained multimodal encoders (BERT/ViT) with lig

28 May 2026. Score: 1.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Streaming recommender systems (SRSs) are widely deployed in real-world applications, where user interests shift and new items arrive over time. As a result, effectively capturing users' latest preferences is challenging, as interactions reflecting recent interests are limited and new items often lack sufficient…

[247]

Can mixture-of-experts routing strategies trained on imbalanced multimodal data improve inference efficiency a

28 May 2026. Score: 2.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for expert selection remain poorly understood. In this work, we introduce routing signatures, a vector representation summarizing expert activation…

[246]

What is the impact of imbalanced domain generalization techniques (e.g., SMoES routing) on the robustness of m

28 May 2026. Score: 4.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have significantly pushed the frontier of egocentric video question answering (EgocentricQA). However, existing benchmarks and studies are mainly limited to common daily activities such as cooking and cleaning. In contrast, real-world deployment inevitably…

[245]

How does SMoES-trained modality routing for multimodal LLMs generalize to out-of-distribution benchmarks like

28 May 2026. Score: 4.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Large-scale pre-trained models (PTMs) show great zero-shot capabilities. In this paper, we study how to leverage them for zero-shot visual question answering (VQA). Our approach is motivated by a few observations. First, VQA questions often require multiple steps of reasoning, which is still a capability that most…

[244]

To what extent does expert-level quantization (e.g., INT8 vs FP16) combined with CPU-GPU expert caching reduce

28 May 2026. Score: 4.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Among parallel decoding paradigms, diffusion large language models (dLLMs) have emerged as a promising candidate that balances generation quality and throughput. However, their integration with Mixture-of-Experts (MoE) architectures is constrained by an expert explosion: as the number of tokens generated in parallel…

[243]

How does the expert explosion phenomenon in MoE diffusion LLMs scale with the number of parallel tokens genera

28 May 2026. Score: 4.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[242]

Can dynamic expert caching strategies in MoE-based diffusion LLMs achieve comparable throughput to static expe

28 May 2026. Score: 4.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[241]

Does the performance advantage of SMoES over hand-crafted or modality-agnostic routing on unseen chart types g

28 May 2026. Score: 5.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Charts are high-density visualization carriers for complex data, serving as a crucial medium for information extraction and analysis. Automated chart understanding poses significant challenges to existing multimodal large language models (MLLMs) due to the need for precise and complex visual reasoning. Current…

[240]

How does SMoES soft modality-guided routing scale in terms of accuracy-efficiency trade-offs (FLOPs per sample

28 May 2026. Score: 3.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Recent advances in vision-language pre-training (VLP) have demonstrated impressive performance in a range of vision-language (VL) tasks. However, there exist several challenges for measuring the community's progress in building general multi-modal intelligence. First, most of the downstream VL datasets are annotated…

[239]

What is the inference efficiency trade-off (throughput vs accuracy) of SMoES compared to modality-agnostic MoE

28 May 2026. Score: 5.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: A pivotal advancement in the progress of large language models (LLMs) is the emergence of the Mixture-of-Experts (MoE) LLMs. Compared to traditional LLMs, MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes. Different from previous…

[238]

How does the optimal number of modality-specific experts in SMoES scale with VLM backbone size (e.g., 7B vs 13

28 May 2026. Score: 4.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: There has been a rapid progress in the task of Visual Question Answering with improved model architectures. Unfortunately, these models are usually computationally intensive due to their sheer size which poses a serious challenge for deployment. We aim to tackle this issue for the specific task of Visual Question…

[237]

What is the effect of SMoES soft modality-guided routing on out-of-distribution chart type generalization (e.g

28 May 2026. Score: 1.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[236]

How does the cross-modal alignment quality (measured by zero-shot accuracy on VQA and TextVQA) of Uni-MoE-2.0-

28 May 2026. Score: 0.00/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: We present Uni-MoE 2.0 from the Lychee family. As a fully open-source omnimodal large model (OLM), it substantially advances Lychee's Uni-MoE series in language-centric multimodal understanding, reasoning, and generating. Based on the dense LLM, we build Uni-MoE-2.0-Omni from scratch through three core contributions:…

[235]

How does SMoES soft modality-guided routing compare to dense VLMs and standard MoE routing on multi-step reaso

28 May 2026. Score: 1.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…