Assignee Research: Index of Papers

[266]

What is the impact of domain shift in image complexity (e.g., from synthetic to natural scenes) on the accurac

28 May 2026. Score: 6.17/10. Verification: L1, Literature synthesis.

Abstract: In Transformer architectures, tokenstextemdash discrete units derived from raw datatextemdash are formed by segmenting inputs into fixed-length chunks. Each token is then mapped to an embedding, enabling parallel attention computations while preserving the input's essential information. Due to the quadratic…

[265]

How does the inference throughput and token count scaling of adaptive token pruning (CAT-like) methods compare

28 May 2026. Score: 3.50/10. Verification: L2, Source-grounded claims.

Abstract: In this paper, we introduce PruneVid, a visual token pruning method designed to enhance the efficiency of multi-modal video understanding. Large Language Models (LLMs) have shown promising performance in video tasks due to their extended capabilities in comprehending visual modalities. However, the substantial…

[264]

How does content-adaptive tokenization affect the accuracy-efficiency trade-off on the MMMU and MathVista data

28 May 2026. Score: 2.67/10. Verification: L2, Source-grounded claims.

Abstract: Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage,…

[263]

Does content-adaptive tokenization improve robustness and accuracy on the MMBench and MME benchmarks under var

28 May 2026. Score: 1.00/10. Verification: L2, Source-grounded claims.

Abstract: Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals, and can be applied in various fields. In the medical field, LVLMs have a high potential to offer substantial assistance for diagnosis and treatment. Before that, it is crucial to develop…

[262]

Does applying cross-modal token pruning from vision to audio Transformers degrade robustness to noise perturba

28 May 2026. Score: 6.50/10. Verification: L2, Source-grounded claims.

Abstract: Vision Transformers (ViTs) have achieved state-of-the-art performance across various computer vision tasks, but their high computational cost remains a challenge. Token pruning has been proposed to reduce this cost by selectively removing less important tokens. While effective in vision tasks by discarding non-object…

[261]

How does content-adaptive tokenization compare to fixed-patch tokenization in terms of inference throughput (t

28 May 2026. Score: 3.67/10. Verification: L1, Literature synthesis.

Abstract: In this paper, we propose a novel approach to address the challenges of printed Urdu text recognition using high-resolution, multi-scale semantic feature extraction. Our proposed UTRNet architecture, a hybrid CNN-RNN model, demonstrates state-of-the-art performance on benchmark datasets. To address the limitations of…

[260]

To what extent does GraphMETRO's alignment mechanism improve robustness to distribution shift in multimodal mo

28 May 2026. Score: 2.33/10. Verification: L1, Literature synthesis.

Abstract: We present a new zero-shot dense retrieval (ZeroDR) method, COCO-DR, to improve the generalization ability of dense retrieval by combating the distribution shifts between source training tasks and target scenarios. To mitigate the impact of document differences, COCO-DR continues pretraining the language model on the…

[259]

How does dynamic token pruning in audio Transformers affect FLOPs efficiency and word error rate on the LibriS

28 May 2026. Score: 2.83/10. Verification: L2, Source-grounded claims.

Abstract: Vision Transformers (ViTs) have achieved state-of-the-art performance across various computer vision tasks, but their high computational cost remains a challenge. Token pruning has been proposed to reduce this cost by selectively removing less important tokens. While effective in vision tasks by discarding non-object…

[258]

What is the throughput and inference efficiency trade-off of GraphMETRO's node-based alignment mechanism versu

28 May 2026. Score: 0.33/10. Verification: L2, Source-grounded claims.

Abstract: The discovery of deep, steerable taxonomies in large text corpora is currently restricted by a trade-off between the surface-level efficiency of topic models and the prohibitive, non-scalable assignment costs of LLM-integrated frameworks. We introduce textbf\LogiPart\, a scalable, hypothesis-first framework for…

[257]

How does node-based Bayesian neural network alignment compare to weight-based BNNs in reducing accuracy degrad

28 May 2026. Score: 4.67/10. Verification: L2, Source-grounded claims.

Abstract: Bayesian neural networks (BNNs) promise improved generalization under covariate shift by providing principled probabilistic representations of epistemic uncertainty. However, weight-based BNNs often struggle with high computational complexity of large-scale architectures and datasets. Node-based BNNs have recently…

[256]

To what extent does token scheduling in sparse MoE inference improve robustness to distribution shift in docum

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[255]

Does GraphMETRO's expert diversity improve cross-domain generalization to out-of-distribution GQA splits compa

28 May 2026. Score: 2.33/10. Verification: L2, Source-grounded claims.

Abstract: Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to the complex distributional shifts naturally occurring in the real world. Here, we develop GraphMETRO, a Graph…

[254]

How does dynamic batch-aware expert selection in MoE inference affect token-per-second throughput compared to

28 May 2026. Score: 2.67/10. Verification: L2, Source-grounded claims.

Abstract: Selective parameter activation provided by Mixture-of-Expert (MoE) models have made them a popular choice in modern foundational models. However, MoEs face a fundamental tension when employed for serving. Batching, critical for performance in serving, forces the activation of all experts, thereby negating MoEs'…

[253]

Does the Lynx token scheduling approach generalize to other sparse MoE architectures (e.g., Mixtral 8x7B, Deep

28 May 2026. Score: 3.50/10. Verification: L2, Source-grounded claims.

Abstract: Recent large language models such as Gemini-1.5, DeepSeek-V3, and Llama-4 increasingly adopt Mixture-of-Experts (MoE) architectures, which offer strong efficiency-performance trade-offs by activating only a fraction of the model per token. Yet academic researchers still lack a fully open, end-to-end MoE platform for…

[252]

What is the inference throughput cost of GraphMETRO's Mixture-of-Experts gating mechanism on VQAv2 relative to

28 May 2026. Score: 2.00/10. Verification: L2, Source-grounded claims.

Abstract: Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to the complex distributional shifts naturally occurring in the real world. Here, we develop GraphMETRO, a Graph…

[251]

To what extent does the expandable side-MoE architecture improve robustness to distribution shift in user-item

28 May 2026. Score: 2.83/10. Verification: L2, Source-grounded claims.

Abstract: Streaming recommender systems (SRSs) are widely deployed in real-world applications, where user interests shift and new items arrive over time. As a result, effectively capturing users' latest preferences is challenging, as interactions reflecting recent interests are limited and new items often lack sufficient…

[250]

How does the number of expert modules in GraphMETRO affect VQAv2 and GQA accuracy under natural distribution s

28 May 2026. Score: 1.00/10. Verification: L2, Source-grounded claims.

Abstract: Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to the complex distributional shifts naturally occurring in the real world. Here, we develop GraphMETRO, a Graph…

[249]

How does the inference throughput of MoE-based multimodal streaming recommenders compare to dense baselines (e

28 May 2026. Score: 6.30/10. Verification: L2, Source-grounded claims.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[248]

What is the accuracy impact (Recall@k, NDCG@k) of replacing pretrained multimodal encoders (BERT/ViT) with lig

28 May 2026. Score: 1.33/10. Verification: L2, Source-grounded claims.

Abstract: Streaming recommender systems (SRSs) are widely deployed in real-world applications, where user interests shift and new items arrive over time. As a result, effectively capturing users' latest preferences is challenging, as interactions reflecting recent interests are limited and new items often lack sufficient…

[247]

Can mixture-of-experts routing strategies trained on imbalanced multimodal data improve inference efficiency a

28 May 2026. Score: 2.83/10. Verification: L2, Source-grounded claims.

Abstract: Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for expert selection remain poorly understood. In this work, we introduce routing signatures, a vector representation summarizing expert activation…

[246]

What is the impact of imbalanced domain generalization techniques (e.g., SMoES routing) on the robustness of m

28 May 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have significantly pushed the frontier of egocentric video question answering (EgocentricQA). However, existing benchmarks and studies are mainly limited to common daily activities such as cooking and cleaning. In contrast, real-world deployment inevitably…

[245]

How does SMoES-trained modality routing for multimodal LLMs generalize to out-of-distribution benchmarks like

28 May 2026. Score: 4.67/10. Verification: L2, Source-grounded claims.

Abstract: Large-scale pre-trained models (PTMs) show great zero-shot capabilities. In this paper, we study how to leverage them for zero-shot visual question answering (VQA). Our approach is motivated by a few observations. First, VQA questions often require multiple steps of reasoning, which is still a capability that most…

[244]

To what extent does expert-level quantization (e.g., INT8 vs FP16) combined with CPU-GPU expert caching reduce

28 May 2026. Score: 4.83/10. Verification: L1, Literature synthesis.

Abstract: Among parallel decoding paradigms, diffusion large language models (dLLMs) have emerged as a promising candidate that balances generation quality and throughput. However, their integration with Mixture-of-Experts (MoE) architectures is constrained by an expert explosion: as the number of tokens generated in parallel…

[243]

How does the expert explosion phenomenon in MoE diffusion LLMs scale with the number of parallel tokens genera

28 May 2026. Score: 4.00/10. Verification: L2, Source-grounded claims.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[242]

Can dynamic expert caching strategies in MoE-based diffusion LLMs achieve comparable throughput to static expe

28 May 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…