Assignee Research: Index of Papers

[69]

Generalizing GNNs with Tokenized Mixture of Experts

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims.

Abstract: Deployed graph neural networks (GNNs) are frozen at deployment yet must fit clean data, generalize under distribution shifts, and remain stable to perturbations. We show that static inference induces a fundamental tradeoff: improving stability requires reducing reliance on shift-sensitive features, leaving an…

[68]

Scaling Vision-Language Models with Sparse Mixture of Experts

27 May 2026. Score: 7.33/10. Verification: L2, Source-grounded claims.

Abstract: The field of natural language processing (NLP) has made significant strides in recent years, particularly in the development of large-scale vision-language models (VLMs). These models aim to bridge the gap between text and visual information, enabling a more comprehensive understanding of multimedia data. However, as…

[67]

SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs

27 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20417321

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[66]

Mixture-of-Experts Models in Vision: Routing, Optimization, and Generalization

27 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20417317

Abstract: Mixture-of-Experts (MoE) architectures enable conditional computation by routing inputs to multiple expert subnetworks and are often motivated as a mechanism for scaling large language models. In this project, we instead study MoE behavior in an image classification setting, focusing on predictive performance, expert…

[65]

SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs

27 May 2026. Score: 6.83/10. Verification: L2, Source-grounded claims.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[64]

SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs

27 May 2026. Score: 6.60/10. Verification: L2, Source-grounded claims.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[63]

ExpertFlow: Efficient Mixture-of-Experts Inference via Predictive Expert Caching

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20417131

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[62]

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20417125

Abstract: Recent advances demonstrate that scaling Large Vision-Language Models (LVLMs) effectively improves downstream task performances. However, existing scaling methods enable all model parameters to be active for each token in the calculation, which brings massive training and inferring costs. In this work, we propose a…

[61]

SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[60]

MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-of

27 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20417084

Abstract: The transformer architecture has become a cornerstone of modern AI, fueling remarkable progress across applications in natural language processing, computer vision, and multi-modal learning. As these models continue to scale explosively for performance, implementation efficiency remains a critical challenge.…

[59]

SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs

27 May 2026. Score: 7.33/10. Verification: L2, Source-grounded claims.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[58]

An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic

27 May 2026. Score: 8.00/10. Verification: L2, Source-grounded claims.

Abstract: To produce a domain-agnostic question answering model for the Machine Reading Question Answering (MRQA) 2019 Shared Task, we investigate the relative benefits of large pre-trained language models, various data sampling strategies, as well as query and context paraphrases generated by back-translation. We find a…

[57]

An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic

27 May 2026. Score: 5.00/10. Verification: L1, Literature synthesis.

Abstract: To produce a domain-agnostic question answering model for the Machine Reading Question Answering (MRQA) 2019 Shared Task, we investigate the relative benefits of large pre-trained language models, various data sampling strategies, as well as query and context paraphrases generated by back-translation. We find a…

[56]

An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic

27 May 2026. Score: 7.17/10. Verification: L2, Source-grounded claims.

Abstract: To produce a domain-agnostic question answering model for the Machine Reading Question Answering (MRQA) 2019 Shared Task, we investigate the relative benefits of large pre-trained language models, various data sampling strategies, as well as query and context paraphrases generated by back-translation. We find a…

[55]

How well do SOTA legal reasoning models support abductive reasoning?

27 May 2026. Score: 7.33/10. Verification: L2, Source-grounded claims.

Abstract: We examine how well the state-of-the-art (SOTA) models used in legal reasoning support abductive reasoning tasks. Abductive reasoning is a form of logical inference in which a hypothesis is formulated from a set of observations, and that hypothesis is used to explain the observations. The ability to formulate such…

[54]

Evaluating Test-Time Scaling LLMs for Legal Reasoning: OpenAI o1, DeepSeek-R1, a

27 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20416449

Abstract: Recent advances in test-time scaling of large language models (LLMs), exemplified by DeepSeek-R1 and OpenAI's o1, show that extending the chain of thought during inference can significantly improve general reasoning performance. However, the impact of this paradigm on legal reasoning remains insufficiently explored.…

[53]

Cast a Wider Net: Coordinated Pass@K Policy Optimization for Code Reasoning

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims.

Abstract: Repeated sampling with a verifier is the standard way to allocate test-time compute for code generation, with pass@\$K\$ as the canonical metric. Yet the standard policy class draws \$K\$ independent samples from a single answer distribution, so attempts often collapse onto near-duplicate reasoning paths and waste…

[52]

HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Co

27 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims.

Abstract: We introduce self-invoking code generation, a new task designed to evaluate the progressive reasoning and problem-solving capabilities of LLMs. In this task, models are presented with a base problem and a related, more complex problem. They must solve the base problem and then utilize its solution to address the more…

[51]

Measuring Coding Challenge Competence With APPS

27 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20416391

Abstract: While programming is one of the most broadly applicable skills in modern society, modern machine learning models still cannot code solutions to basic problems. Despite its importance, there has been surprisingly little work on evaluating code generation, and it can be difficult to accurately assess code generation…

[50]

An Efficiency Study for SPLADE Models

27 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20416287

Abstract: Latency and efficiency issues are often overlooked when evaluating IR models based on Pretrained Language Models (PLMs) in reason of multiple hardware and software testing scenarios. Nevertheless, efficiency is an important part of such systems and should not be overlooked. In this paper, we focus on improving the…

[49]

Question Decomposition for Retrieval-Augmented Generation

27 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20416285

Abstract: Grounding large language models (LLMs) in verifiable external sources is a well-established strategy for generating reliable answers. Retrieval-augmented generation (RAG) is one such approach, particularly effective for tasks like question answering: it retrieves passages that are semantically related to the question…

[48]

Evaluating Multi-Hop Reasoning in RAG Systems: A Comparison of LLM-Based Retriev

27 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20416269

Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge to answer questions more accurately. However, research on evaluating RAG systems-particularly the retriever component-remains limited, as most existing work focuses on single-context retrieval rather than multi-hop…

[47]

Evaluating Multi-Hop Reasoning in RAG Systems: A Comparison of LLM-Based Retriev

27 May 2026. Score: 6.90/10. Verification: L2, Source-grounded claims.

Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge to answer questions more accurately. However, research on evaluating RAG systems-particularly the retriever component-remains limited, as most existing work focuses on single-context retrieval rather than multi-hop…

[46]

A Hierarchical Assessment of Adversarial Severity

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20416192

Abstract: Adversarial Robustness is a growing field that evidences the brittleness of neural networks. Although the literature on adversarial robustness is vast, a dimension is missing in these studies: assessing how severe the mistakes are. We call this notion "Adversarial Severity" since it quantifies the downstream impact…

[45]

Enhancing Human-Like Responses in Large Language Models

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20416125

Abstract: This paper explores the advancements in making large language models (LLMs) more human-like. We focus on techniques that enhance natural language understanding, conversational coherence, and emotional intelligence in AI systems. The study evaluates various approaches, including fine-tuning with diverse datasets,…