Assignee Research: Index of Papers

[193]

What is the scaling efficiency of SMoES-based MoE-VLMs in terms of downstream task accuracy per additional exp

28 May 2026. Score: 2.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[192]

Does predictive expert caching in SMoE multimodal models improve robustness to cross-modal distribution shifts

28 May 2026. Score: 1.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Graph data are inherently complex and heterogeneous, leading to a high natural diversity of distributional shifts. However, it remains unclear how to build machine learning architectures that generalize to the complex distributional shifts naturally occurring in the real world. Here, we develop GraphMETRO, a Graph…

[191]

What is the FLOPs efficiency (tokens per FLOP) of SMoE models versus dense models when evaluated on MMMU subse

28 May 2026. Score: 1.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: There has been a rapid progress in the task of Visual Question Answering with improved model architectures. Unfortunately, these models are usually computationally intensive due to their sheer size which poses a serious challenge for deployment. We aim to tackle this issue for the specific task of Visual Question…

[190]

How does the inference throughput (tokens/sec) of SMoE-based multimodal models compare to dense baselines on t

28 May 2026. Score: 1.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[189]

Does MoE-LLaVA's routing strategy improve cross-modal robustness to textual adversarial perturbations (e.g., s

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Following the success in advancing natural language processing and understanding, transformers are expected to bring revolutionary changes to computer vision. This work provides a comprehensive study on the robustness of vision transformers (ViTs) against adversarial perturbations. Tested on various white-box and…

[188]

What is the inference throughput (tokens per second) and memory footprint trade-off for MoE-LLaVA versus dense

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[187]

How does the robustness of SMoES-based MoE-VLMs with soft modality-guided routing compare to dense models of e

28 May 2026. Score: 3.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[186]

What is the inference throughput and memory efficiency trade-off of SMoES MoE-VLMs relative to dense VLMs of e

28 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20432521

Abstract: The field of natural language processing (NLP) has made significant strides in recent years, particularly in the development of large-scale vision-language models (VLMs). These models aim to bridge the gap between text and visual information, enabling a more comprehensive understanding of multimedia data. However, as…

[185]

How does the SMoES soft modality-guided routing mechanism compare to dense baselines and hard routing variants

28 May 2026. Score: 2.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[184]

What is the performance gap trend in visual reasoning accuracy between SMoES-based MoE-VLMs and dense models a

28 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20432411

Abstract: With the increasing data volume, there is a trend of using large-scale pre-trained models to store the knowledge into an enormous number of model parameters. The training of these models is composed of lots of dense algebras, requiring a huge amount of hardware resources. Recently, sparsely-gated Mixture-of-Experts…

[183]

How does the scaling behavior of inference throughput and reasoning accuracy differ between SMoES MoE-VLMs and

28 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20432231

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family…

[182]

How does the inference throughput and accuracy of SMoES-based MoE-VLMs with soft modality-guided routing compa

28 May 2026. Score: 7.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: The transformer architecture has become a cornerstone of modern AI, fueling remarkable progress across applications in natural language processing, computer vision, and multi-modal learning. As these models continue to scale explosively for performance, implementation efficiency remains a critical challenge.…

[181]

What is the accuracy-throughput Pareto frontier of SMoES MoE-VLMs versus dense models on cross-modal reasoning

28 May 2026. Score: 3.33/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Migrating computational intensive tasks from mobile devices to more resourceful cloud servers is a promising technique to increase the computational capacity of mobile devices while saving their battery energy. In this paper, we consider an MIMO multicell system where multiple mobile users (MUs) ask for computation…

[180]

What is the impact of data augmentation techniques on LLM generalization across domains, as measured by F1 sco

28 May 2026. Score: 7.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Abstract This paper critically examines model compression techniques within the machine learning (ML) domain, emphasizing their role in enhancing model efficiency for deployment in resource-constrained environments, such as mobile devices, edge computing, and Internet of Things (IoT) systems. By systematically…

[179]

How does the inference throughput (tokens per second) of SMoES-based MoE-VLMs compare to dense models of equal

28 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20432042

Abstract: In performing a Bayesian analysis of astronomical data, two difficult problems often emerge. First, in estimating the parameters of some model for the data, the resulting posterior distribution may be multimodal or exhibit pronounced (curving) degeneracies, which can cause problems for traditional MCMC sampling…

[178]

How do different sampling strategies affect the efficiency and accuracy of LLMs on domain-agnostic question an

28 May 2026. Score: 7.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional…

[177]

How does the effectiveness of negative sampling for unanswerable questions in the MRQA dataset compare to SQuA

28 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20432015

Abstract: In the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching…

[176]

How does back-translation paraphrasing affect the robustness of LLM question answering performance across diff

28 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20431979

Abstract: NLP practitioners often want to take existing trained models and apply them to data from new domains. While fine-tuning or few-shot learning can be used to adapt a base model, there is no single recipe for making these techniques work; moreover, one may not have access to the original model weights if it is deployed…

[175]

How do domain-agnostic question answering models trained on mixed-domain datasets (SQuAD 2.0, NewsQA, and Triv

28 May 2026. Score: 8.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20431969

Abstract: Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on…

[174]

What is the quantifiable difference in inference latency and accuracy degradation when applying suboptimal dat

28 May 2026. Score: 3.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Introduction * Information and Likelihood Theory: A Basis for Model Selection and Inference * Basic Use of the Information-Theoretic Approach * Formal Inference From More Than One Model: Multi-Model Inference (MMI) * Monte Carlo Insights and Extended Examples * Statistical Theory and Numerical Results * Summary

[173]

How does negative sampling affect inference efficiency and accuracy tradeoffs across different model scales in

28 May 2026. Score: 7.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This paper presents a focused investigation into real-time segmentation in unstructured environments, a crucial aspect for enabling autonomous navigation in off-road robots. To address this challenge, an improved variant of the DDRNet23-slim model is proposed, which includes a lightweight network architecture and…

[172]

What is the comparative evaluation of negative sampling versus domain-specific fine-tuning on MRQA 2019 benchm

28 May 2026. Score: 3.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: To produce a domain-agnostic question answering model for the Machine Reading Question Answering (MRQA) 2019 Shared Task, we investigate the relative benefits of large pre-trained language models, various data sampling strategies, as well as query and context paraphrases generated by back-translation. We find a…

[171]

How does negative sampling performance scale across different LLM architectures (7B vs 70B) when evaluated on

28 May 2026. Score: 1.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: To produce a domain-agnostic question answering model for the Machine Reading Question Answering (MRQA) 2019 Shared Task, we investigate the relative benefits of large pre-trained language models, various data sampling strategies, as well as query and context paraphrases generated by back-translation. We find a…

[170]

How does the inference throughput-accuracy trade-off differ between o1-preview and DeepSeek-R1 under constrain

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Recent advances in test-time scaling of large language models (LLMs), exemplified by DeepSeek-R1 and OpenAI's o1, show that extending the chain of thought during inference can significantly improve general reasoning performance. However, the impact of this paradigm on legal reasoning remains insufficiently explored.…

[169]

How does the adversarial robustness of o1-preview and DeepSeek-R1 to synonym substitution perturbations scale

28 May 2026. Score: 3.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Large Language Models (LLMs) exhibit impressive capabilities, but remain susceptible to a growing spectrum of safety risks, including jailbreaks, toxic content, hallucinations, and bias. Existing defenses often address only a single threat type or resort to rigid outright rejection, sacrificing user experience and…