Assignee Research: Index of Papers

[241]

Does the performance advantage of SMoES over hand-crafted or modality-agnostic routing on unseen chart types g

28 May 2026. Score: 5.33/10. Verification: L2, Source-grounded claims.

Abstract: Charts are high-density visualization carriers for complex data, serving as a crucial medium for information extraction and analysis. Automated chart understanding poses significant challenges to existing multimodal large language models (MLLMs) due to the need for precise and complex visual reasoning. Current…

[240]

How does SMoES soft modality-guided routing scale in terms of accuracy-efficiency trade-offs (FLOPs per sample

28 May 2026. Score: 3.33/10. Verification: L2, Source-grounded claims.

Abstract: Recent advances in vision-language pre-training (VLP) have demonstrated impressive performance in a range of vision-language (VL) tasks. However, there exist several challenges for measuring the community's progress in building general multi-modal intelligence. First, most of the downstream VL datasets are annotated…

[239]

What is the inference efficiency trade-off (throughput vs accuracy) of SMoES compared to modality-agnostic MoE

28 May 2026. Score: 5.00/10. Verification: L2, Source-grounded claims.

Abstract: A pivotal advancement in the progress of large language models (LLMs) is the emergence of the Mixture-of-Experts (MoE) LLMs. Compared to traditional LLMs, MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes. Different from previous…

[238]

How does the optimal number of modality-specific experts in SMoES scale with VLM backbone size (e.g., 7B vs 13

28 May 2026. Score: 4.83/10. Verification: L2, Source-grounded claims.

Abstract: There has been a rapid progress in the task of Visual Question Answering with improved model architectures. Unfortunately, these models are usually computationally intensive due to their sheer size which poses a serious challenge for deployment. We aim to tackle this issue for the specific task of Visual Question…

[237]

What is the effect of SMoES soft modality-guided routing on out-of-distribution chart type generalization (e.g

28 May 2026. Score: 1.50/10. Verification: L1, Literature synthesis.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[236]

How does the cross-modal alignment quality (measured by zero-shot accuracy on VQA and TextVQA) of Uni-MoE-2.0-

28 May 2026. Score: 0.00/10. Verification: L1, Literature synthesis.

Abstract: We present Uni-MoE 2.0 from the Lychee family. As a fully open-source omnimodal large model (OLM), it substantially advances Lychee's Uni-MoE series in language-centric multimodal understanding, reasoning, and generating. Based on the dense LLM, we build Uni-MoE-2.0-Omni from scratch through three core contributions:…

[235]

How does SMoES soft modality-guided routing compare to dense VLMs and standard MoE routing on multi-step reaso

28 May 2026. Score: 1.50/10. Verification: L2, Source-grounded claims.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[234]

How does the inference throughput (tokens per second) of Uni-MoE-2.0-Omni's dynamic-capacity MoE compare to de

28 May 2026. Score: 4.33/10. Verification: L1, Literature synthesis.

Abstract: We present Uni-MoE 2.0 from the Lychee family. As a fully open-source omnimodal large model (OLM), it substantially advances Lychee's Uni-MoE series in language-centric multimodal understanding, reasoning, and generating. Based on the dense LLM, we build Uni-MoE-2.0-Omni from scratch through three core contributions:…

[233]

To what extent does Uni-MoE-2.0-Omni's iterative reinforcement strategy improve reasoning accuracy on mathemat

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis.

Abstract: We present Uni-MoE 2.0 from the Lychee family. As a fully open-source omnimodal large model (OLM), it substantially advances Lychee's Uni-MoE series in language-centric multimodal understanding, reasoning, and generating. Based on the dense LLM, we build Uni-MoE-2.0-Omni from scratch through three core contributions:…

[232]

How does the inference throughput (tokens/sec) of SMoES-routed multimodal models compare to hard-routed MoE-VL

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[231]

Does the predictive expert caching strategy in ExpertFlow maintain its throughput gains and object-level hallu

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[230]

What is the impact of SMoES routing on cross-dataset generalization robustness (ANLS accuracy) in zero-shot se

28 May 2026. Score: 2.17/10. Verification: L2, Source-grounded claims.

Abstract: Sparsely activated Mixture-of-Experts (SMoE) has shown promise to scale up the learning capacity of neural networks, however, they have issues like (a) High Memory Usage, due to duplication of the network layers into multiple copies as experts; and (b) Redundancy in Experts, as common learning-based routing policies…

[229]

Does soft MoE routing with token-level gating improve ANLS on InfographicsVQA and ChartQA compared to hard top

28 May 2026. Score: 1.50/10. Verification: L2, Source-grounded claims.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[228]

How does ExpertFlow's offloading and caching mechanism compare to static cache baselines in terms of inference

28 May 2026. Score: 3.67/10. Verification: L2, Source-grounded claims.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[227]

What is the impact of ExpertFlow's token scheduling policy on downstream task accuracy (e.g., VQA v2, MMBench)

28 May 2026. Score: 5.00/10. Verification: L2, Source-grounded claims.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[226]

How does the latency-accuracy trade-off of ExpertFlow's predictive caching compare to dense baselines and othe

28 May 2026. Score: 5.33/10. Verification: L1, Literature synthesis.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[225]

Does the predictive expert caching strategy in ExpertFlow reduce object existence hallucination (POPE accuracy

28 May 2026. Score: 6.33/10. Verification: L2, Source-grounded claims.

Abstract: Large vision language models (LVLMs) often suffer from object hallucination, producing objects not present in the given images. While current benchmarks for object hallucination primarily concentrate on the presence of a single object class rather than individual entities, this work systematically investigates…

[224]

How does the predictive expert caching strategy in ExpertFlow affect multi-object hallucination rates (measure

28 May 2026. Score: 3.33/10. Verification: L2, Source-grounded claims.

Abstract: Since the introduction of ChatGPT, large language models (LLMs) have demonstrated significant utility in various tasks, such as answering questions through retrieval-augmented generation. Context can be retrieved using a vectorized database, serving as a foundation for LLMs to generate responses. However,…

[223]

How robust is ExpertFlow's token scheduling in MoE vision-language models to distribution shifts in attribute

28 May 2026. Score: 5.00/10. Verification: L2, Source-grounded claims.

Abstract: Recent large language models such as Gemini-1.5, DeepSeek-V3, and Llama-4 increasingly adopt Mixture-of-Experts (MoE) architectures, which offer strong efficiency-performance trade-offs by activating only a fraction of the model per token. Yet academic researchers still lack a fully open, end-to-end MoE platform for…

[222]

What is the throughput vs. accuracy trade-off of ExpertFlow's token scheduling in MoE vision-language models o

28 May 2026. Score: 3.33/10. Verification: L2, Source-grounded claims.

Abstract: Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for expert selection remain poorly understood. In this work, we introduce routing signatures, a vector representation summarizing expert activation…

[221]

How does ExpertFlow's token scheduling strategy in MoE vision-language models affect attribute binding accurac

28 May 2026. Score: 1.67/10. Verification: L1, Literature synthesis.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[220]

Can routing signatures learned from few-shot prompts on NLVR2 and SNLI-VE generalize to out-of-distribution co

28 May 2026. Score: 3.33/10. Verification: L2, Source-grounded claims.

Abstract: Integrating image and text data through multi-modal learning has emerged as a new approach in medical imaging research, following its successful deployment in computer vision. While considerable efforts have been dedicated to establishing medical foundation models and their zero-shot transfer to downstream tasks, the…

[219]

How does the inference throughput and FLOPs efficiency of SMoE with dynamic routing signatures compare against

28 May 2026. Score: 3.33/10. Verification: L2, Source-grounded claims.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[218]

Does task-conditioned routing signatures in SMoE transformers improve compositional reasoning accuracy on NLVR

28 May 2026. Score: 3.00/10. Verification: L2, Source-grounded claims.

Abstract: Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for expert selection remain poorly understood. In this work, we introduce routing signatures, a vector representation summarizing expert activation…

[217]

Can Dynamic Clue Bottlenecks generalize to out-of-distribution robustness on VCR adversarial splits when scale

28 May 2026. Score: 2.67/10. Verification: L2, Source-grounded claims.

Abstract: Recent advances in multimodal large language models (LLMs) have shown extreme effectiveness in visual question answering (VQA). However, the design nature of these end-to-end models prevents them from being interpretable to humans, undermining trust and applicability in critical domains. While post-hoc rationales…