Assignee Research: Index of Papers

Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 5361 papers; mean review score 5.66/10; 1472 Zenodo DOIs.

Results 5251–5275 of 5361 entries

Papers

[111]

How does varying the number of active experts (k) in sparse MoE vision-language models affect VQA accuracy and

28 May 2026. Score: 8.33/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20426299

Abstract: The rising popularity of explainable artificial intelligence (XAI) to understand high-performing black boxes raised the question of how to evaluate explanations of machine learning (ML) models. While interpretability and explainability are often presented as a subjectively validated binary property, we consider it a…

[110]

How does the MambaFormer hybrid MoE architecture's efficiency (FLOPs per token and throughput) scale with mode

28 May 2026. Score: 8.67/10. Verification: L2, Source-grounded claims.

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[109]

To what extent does the accuracy of multi-step retrieval pipelines for multi-hop QA degrade under noisy or adv

28 May 2026. Score: 9.17/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20426236

Abstract: This paper proposes a framework for quantitatively evaluating interactive LLMs such as ChatGPT using publicly available data sets. We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks. We evaluate the multitask, multilingual and multi-modal…

[108]

How does the MambaFormer hybrid MoE architecture's efficiency (FLOPs per token and throughput) scale with mode

28 May 2026. Score: 7.17/10. Verification: L2, Source-grounded claims.

Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional…

[107]

Does GPT-4's multi-hop reasoning accuracy on HotpotQA degrade monotonically with increasing retrieval steps (2

28 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20424631

Abstract: Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn, especially when embedded in more complex tasks. To address…

[106]

What is the impact of token-level guided routing on inference latency and cross-modal reasoning accuracy in Mo

28 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims.

Abstract: Abstract In the past years, multimodal large language models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering and visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in…

[105]

What is the accuracy drop on the HotpotQA multi-hop dataset when using a 128K-context Llama-3 model without re

28 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20424168

Abstract: Prompting-based large language models (LLMs) are surprisingly powerful at generating natural language reasoning steps or Chains-of-Thoughts (CoT) for multi-step question answering (QA). They struggle, however, when the necessary knowledge is either unavailable to the LLM or not up-to-date within its parameters. While…

[104]

How does the Tree of Reviews framework compare to standard chain-based retrieval on the MuSiQue multi-hop QA b

28 May 2026. Score: 6.67/10. Verification: L2, Source-grounded claims.

Abstract: Multi-hop question answering is a knowledge-intensive complex problem. Large Language Models (LLMs) use their Chain of Thoughts (CoT) capability to reason complex problems step by step, and retrieval-augmentation can effectively alleviate factual errors caused by outdated and unknown knowledge in LLMs. Recent works…

[103]

Does the Tree of Reviews iterative retrieval method improve robustness to irrelevant context in multi-hop QA c

28 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20424153

Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) for domain-specific question-answering (QA) tasks by leveraging external knowledge sources. However, traditional RAG systems primarily focus on relevance-based retrieval and often struggle with redundancy, especially when reasoning requires…

[102]

What is the impact of fine-tuning on negative interaction trajectories versus positive-only trajectories for L

28 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20423642

Abstract: Large language models (LLMs) have been increasingly used to interact with external environments (e.g., games, compilers, APIs) as goal-driven agents. However, it remains challenging for these language agents to quickly and efficiently learn from trial-and-error as traditional reinforcement learning methods require…

[101]

How does the inference efficiency (tokens/sec and memory usage) of a 70B-parameter LLM agent compare when usin

28 May 2026. Score: 1.17/10. Verification: L2, Source-grounded claims.

Abstract: Retrieval-Augmented Generation (RAG) methods enhance LLM performance by efficiently filtering relevant context for LLMs, reducing hallucinations and inference cost. However, most existing RAG methods focus on single-step retrieval, which is often insufficient for answering complex questions that require multi-step…

[100]

Can AnyExperts' dynamic expert allocation maintain consistent accuracy improvements over dense baselines when

28 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20423344

Abstract: Despite their remarkable achievement, gigantic transformers encounter significant drawbacks, including exorbitant computational and memory footprints during training, as well as severe collapse evidenced by a high degree of parameter redundancy. Sparsely-activated Mixture-of-Experts (SMoEs) have shown promise to…

[99]

Can AnyExperts' dynamic expert allocation maintain consistent accuracy improvements over dense baselines when

28 May 2026. Score: 3.00/10. Verification: L2, Source-grounded claims.

Abstract: Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for expert selection remain poorly understood. In this work, we introduce routing signatures, a vector representation summarizing expert activation…

[98]

What is the impact of expert capacity imbalance on AnyExperts' performance degradation when evaluated on domai

28 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20421444

Abstract: Vision-language foundation models achieve promising performance in natural image classification, yet their direct application to medical imaging is limited by severe domain shifts, resolution mismatches, and the multi-label nature of clinical diagnosis. Training dedicated medical foundation models from scratch,…

[97]

How does AnyExperts' on-demand routing strategy compare to fixed routing baselines in terms of inference laten

28 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20421249

[96]

To what extent does NOVA's anomaly localization accuracy degrade when tested on out-of-distribution brain MRI

28 May 2026. Score: 8.00/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20421241

Abstract: In many real-world applications, deployed models encounter inputs that differ from the data seen during training. Out-of-distribution detection identifies whether an input stems from an unseen distribution, while open-world recognition flags such inputs to ensure the system remains robust as ever-emerging, previously…

[95]

What is the computational overhead of implementing expert bridging versus full fine-tuning in terms of inferen

28 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20420842

Abstract: Fine-tuning Large Language Models (LLMs) is a common practice to adapt pre-trained models for specific applications. While methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multi-task scenarios. In contrast, Mixture-of-Expert…

[94]

How does the performance of NOVA's open-world recognition capability compare to existing OOD detection methods

28 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims.

Abstract: Mahalanobis distance (MD) is a simple and popular post-processing method for detecting out-of-distribution (OOD) inputs in neural networks. We analyze its failure modes for near-OOD detection and propose a simple fix called relative Mahalanobis distance (RMD) which improves performance and is more robust to…

[93]

What is the impact of expert utilization patterns on model generalization for multi-step reasoning tasks when

28 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20420504

Abstract: While Transformer architectures have demonstrated impressive scalability across domains, they continue to face challenges in long-context reasoning, computational efficiency, and structural generalization - largely due to rigid layer stacking, dense attention, and reliance on positional encodings. We present…

[92]

How does dynamic expert specialization in AnyExperts models affect inference efficiency on multi-step reasonin

28 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20420480

Abstract: In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter…

[91]

How do mixture-of-experts routing strategies generalize across different computer vision tasks when using abla

28 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20420441

Abstract: Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent…

[90]

How does the proposed CAT method compare to fixed tokenization approaches in terms of end-to-end inference tim

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20419644

Abstract: Transformer-based video diffusion models rely on 3D attention over spatial and temporal tokens, which incurs quadratic time and memory complexity and makes end-to-end training for ultra-high-resolution videos prohibitively expensive. To overcome this bottleneck, we propose a pure image adaptation framework that…

[89]

How does content-adaptive tokenization affect the inference latency and accuracy of multimodal vision-language

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20419639

Abstract: Modern multimodal large language models (MLLMs) typically keep the language model fixed and train a visual projector that maps the pixels into a sequence of tokens in its embedding space, so that images can be presented in essentially the same form as text. However, the language model has been optimized to operate on…

[88]

What is the impact of dynamic token count on FLOPs efficiency and reasoning accuracy when processing variable-

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20419612

Abstract: Vision Transformers (ViTs) have achieved state-of-the-art performance across various computer vision tasks, but their high computational cost remains a challenge. Token pruning has been proposed to reduce this cost by selectively removing less important tokens. While effective in vision tasks by discarding non-object…

[87]

How does GraphMETRO's alignment mechanism influence performance on out-of-distribution graph data

27 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20419607

Abstract: Bayesian neural networks (BNNs) promise improved generalization under covariate shift by providing principled probabilistic representations of epistemic uncertainty. However, weight-based BNNs often struggle with high computational complexity of large-scale architectures and datasets. Node-based BNNs have recently…

« Prev 1 … 209 210 211 212 213 … 215 Next »