Assignee Research: Index of Papers

[309]

How does the end-to-end latency of RAG systems with 128K context windows compare to iterative retrieval with B

28 May 2026. Score: 5.00/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Retrieval augmented generation (RAG) with large language models (LLMs) for Question Answering (QA) entails furnishing relevant context within the prompt to facilitate the LLM in answer generation. During the generation, inaccuracies or hallucinations frequently occur due to two primary factors: inadequate or…

[308]

Can the expert specialization patterns learned during pretraining of MoE LLMs be transferred to multimodal mod

28 May 2026. Score: 7.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Since the 1950s, when the Turing Test was introduced, there has been notable progress in machine language intelligence. Language modeling, crucial for AI development, has evolved from statistical to neural models over the las... | Find, read and cite all the research you need on Tech Science Press

[307]

How does the expert utilization distribution of SMoES on VQA-CP v2 compare to top-k routing when evaluated und

28 May 2026. Score: 8.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20435660

Abstract: Purpose The purpose of this paper is to provide a comprehensive, yet concise, overview of the considerations and metrics required for partial least squares structural equation modeling (PLS-SEM) analysis and result reporting. Preliminary considerations are summarized first, including reasons for choosing PLS-SEM,…

[306]

How does ExpertFlow's inference efficiency (measured in tokens per second and GPU memory usage) on multimodal

28 May 2026. Score: 4.00/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[305]

How do different expert routing strategies in MambaFormer affect throughput and FLOPs per token efficiency on

28 May 2026. Score: 9.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20435649

Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including…

[304]

How does cross-modal routing consistency in MoE vision-language models influence robustness to distribution sh

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[303]

How does MambaFormer's inference latency compare to Transformer MoE baselines on HumanEval and MBPP benchmarks

28 May 2026. Score: 5.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: We introduce self-invoking code generation, a new task designed to evaluate the progressive reasoning and problem-solving capabilities of LLMs. In this task, models are presented with a base problem and a related, more complex problem. They must solve the base problem and then utilize its solution to address the more…

[302]

What is the impact of sparsity ratio in token-level routing on the trade-off between inference latency and mul

28 May 2026. Score: 3.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) architectures enable conditional computation by routing inputs to multiple expert subnetworks and are often motivated as a mechanism for scaling large language models. In this project, we instead study MoE behavior in an image classification setting, focusing on predictive performance, expert…

[301]

Can Vendi-RAG's iterative retrieval process maintain robustness to irrelevant context in multi-hop QA scenario

28 May 2026. Score: 4.00/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) for domain-specific question-answering (QA) tasks by leveraging external knowledge sources. However, traditional RAG systems primarily focus on relevance-based retrieval and often struggle with redundancy, especially when reasoning requires…

[300]

How does the Tree of Reviews framework scale with context length on the MuSiQue benchmark when using Llama-3 w

28 May 2026. Score: 1.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with…

[299]

What is the impact of varying reflection memory length on the success rate and average reward of LLM agents us

28 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This paper develops LongNav-R1, an end-to-end multi-turn reinforcement learning (RL) framework designed to optimize Visual-Language-Action (VLA) models for long-horizon navigation. Unlike existing single-turn paradigm, LongNav-R1 reformulates the navigation decision process as a continuous multi-turn conversation…

[298]

What is the impact of retrieval diversity optimization on inference efficiency and latency when scaling Vendi-

28 May 2026. Score: 2.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) for domain-specific question-answering (QA) tasks by leveraging external knowledge sources. However, traditional RAG systems primarily focus on relevance-based retrieval and often struggle with redundancy, especially when reasoning requires…

[297]

Does Reflexion's verbal reinforcement learning generalize to multimodal agents on the ALFRED benchmark for emb

28 May 2026. Score: 1.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Large language models (LLMs) have been increasingly used to interact with external environments (e.g., games, compilers, APIs) as goal-driven agents. However, it remains challenging for these language agents to quickly and efficiently learn from trial-and-error as traditional reinforcement learning methods require…

[296]

How does the number of iterative self-reflection steps affect the accuracy and inference efficiency of languag

28 May 2026. Score: 2.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Small Language Models (SLMs) offer computational efficiency and accessibility, yet a systematic evaluation of their performance and environmental impact remains lacking. We introduce SLM-Bench, the first benchmark specifically designed to assess SLMs across multiple dimensions, including accuracy, computational…

[295]

To what extent does AnyExperts' dynamic expert allocation mitigate representational collapse and maintain per-

28 May 2026. Score: 4.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Multimodal Mixture-of-Experts (MoE) models offer a promising path toward scalable and efficient large vision-language systems. However, existing approaches rely on rigid routing strategies (typically activating a fixed number of experts per token) ignoring the inherent heterogeneity in semantic importance across…

[294]

What is the scaling efficiency in terms of memory usage and QA accuracy when applying value-based embedder tra

28 May 2026. Score: 5.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: We propose ReKV, a novel training-free approach that enables efficient streaming video question-answering (StreamingVQA), by seamlessly integrating with existing Video Large Language Models (Video-LLMs). Traditional VideoQA systems struggle with long videos, as they must process entire videos before responding to…

[293]

Does AnyExperts' dynamic expert allocation maintain consistent accuracy improvements over dense baselines when

28 May 2026. Score: 3.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for expert selection remain poorly understood. In this work, we introduce routing signatures, a vector representation summarizing expert activation…

[292]

How does the inference throughput of AnyExperts' sparse MoE architecture compare to dense transformers of equi

28 May 2026. Score: 6.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: We describe MUSCLE, a new computer program for creating multiple alignments of protein sequences. Elements of the algorithm include fast distance estimation using kmer counting, progressive alignment using a new profile function we call the log-expectation score, and refinement using tree-dependent restricted…

[291]

What is the throughput trade-off (inference latency vs. accuracy) when scaling expert count in vision-language

28 May 2026. Score: 5.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: As a fundamental and challenging task in bridging language and vision domains, Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality, and its key challenge is to measure the semantic similarity across different modalities.…

[290]

To what extent does scaling the backbone size (e.g., ViT-B vs. ViT-L) in multimodal models improve robustness

28 May 2026. Score: 5.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Anomaly Detection (AD) and Anomaly Localization (AL) are crucial in fields that demand high reliability, such as medical imaging and industrial monitoring. However, current AD and AL approaches are often susceptible to adversarial attacks due to limitations in training data, which typically include only normal,…

[289]

How does the number of expert modules in a mixture-of-experts architecture affect cross-domain accuracy degrad

28 May 2026. Score: 9.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20435362

Abstract: The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training of large-scale models. This paper proposes BLIP-2, a generic and efficient pre-training strategy that bootstraps vision-language pre-training from off-the-shelf frozen pre-trained image encoders and frozen…

[288]

To what extent does expert capacity imbalance in multimodal LLMs correlate with per-class F1 score variance on

28 May 2026. Score: 8.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20435352

Abstract: Unlike previous studies on the Metaverse based on Second Life, the current Metaverse is based on the social value of Generation Z that online and offline selves are not different. With the technological development of deep learning-based high-precision recognition models and natural generation models, Metaverse is…

[287]

What is the trade-off between inference throughput and anomaly localization precision when applying test-time

28 May 2026. Score: 6.00/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Deep convolutional neural networks have performed remarkably well on many Computer Vision tasks. However, these networks are heavily reliant on big data to avoid overfitting. Overfitting refers to the phenomenon when a network learns a function with very high variance such as to perfectly model the training data.…

[286]

What is the impact of routing signature diversity on expert load balancing and downstream task accuracy in spa

28 May 2026. Score: 6.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[285]

Can task-conditioned routing signatures improve the robustness of sparse MoE transformers to distribution shif

28 May 2026. Score: 6.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for expert selection remain poorly understood. In this work, we introduce routing signatures, a vector representation summarizing expert activation…