Index  |  Benchmarks  |  Mathematics  |  Graph  |  About
Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 4343 papers; mean review score 5.87/10; 1389 Zenodo DOIs.
Results 4226–4250 of 4343 entries

Papers

[118]
28 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20426978

Abstract: Most Reading Comprehension methods limit themselves to queries which can be answered using a single sentence, paragraph, or document. Enabling models to combine disjoint pieces of textual evidence would extend the scope of machine comprehension methods, but currently no resources exist to train and test this…

[117]
28 May 2026. Score: 3.83/10. Verification: L2, Source-grounded claims.

Abstract: Extractive reading comprehension question answering (QA) datasets are typically evaluated using Exact Match (EM) and F1-score, but these metrics often fail to fully capture model performance. With the success of large language models (LLMs), they have been employed in various tasks, including serving as judges…

[116]
28 May 2026. Score: 0.50/10. Verification: L2, Source-grounded claims.

Abstract: Mixture-of-Experts architectures have become the standard for scaling large language models due to their superior parameter efficiency. To accommodate the growing number of experts in practice, modern inference systems commonly adopt expert parallelism to distribute experts across devices. However, the absence of…

[115]
28 May 2026. Score: 3.00/10. Verification: L2, Source-grounded claims.

Abstract: A pivotal advancement in the progress of large language models (LLMs) is the emergence of the Mixture-of-Experts (MoE) LLMs. Compared to traditional LLMs, MoE LLMs can achieve higher performance with fewer parameters, but it is still hard to deploy them due to their immense parameter sizes. Different from previous…

[114]
28 May 2026. Score: 1.50/10. Verification: L2, Source-grounded claims.

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments…

[113]
28 May 2026. Score: 5.17/10. Verification: L2, Source-grounded claims.

Abstract: Rapid advancements in large language models (LLMs) have increased interest in deploying them on mobile devices for on-device AI applications. Mobile users interact differently with LLMs compared to desktop users, creating unique expectations and data biases. Current benchmark datasets primarily target at server and…

[112]
28 May 2026. Score: 4.50/10. Verification: L2, Source-grounded claims.

Abstract: Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for expert selection remain poorly understood. In this work, we introduce routing signatures, a vector representation summarizing expert activation…

[111]
28 May 2026. Score: 8.33/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20426299

Abstract: The rising popularity of explainable artificial intelligence (XAI) to understand high-performing black boxes raised the question of how to evaluate explanations of machine learning (ML) models. While interpretability and explainability are often presented as a subjectively validated binary property, we consider it a…

[110]
28 May 2026. Score: 8.67/10. Verification: L2, Source-grounded claims.

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[109]
28 May 2026. Score: 9.17/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20426236

Abstract: This paper proposes a framework for quantitatively evaluating interactive LLMs such as ChatGPT using publicly available data sets. We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks. We evaluate the multitask, multilingual and multi-modal…

[108]
28 May 2026. Score: 7.17/10. Verification: L2, Source-grounded claims.

Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional…

[107]
28 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20424631

Abstract: Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn, especially when embedded in more complex tasks. To address…

[106]
28 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims.

Abstract: Abstract In the past years, multimodal large language models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering and visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in…

[105]
28 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20424168

Abstract: Prompting-based large language models (LLMs) are surprisingly powerful at generating natural language reasoning steps or Chains-of-Thoughts (CoT) for multi-step question answering (QA). They struggle, however, when the necessary knowledge is either unavailable to the LLM or not up-to-date within its parameters. While…

[104]
28 May 2026. Score: 6.67/10. Verification: L2, Source-grounded claims.

Abstract: Multi-hop question answering is a knowledge-intensive complex problem. Large Language Models (LLMs) use their Chain of Thoughts (CoT) capability to reason complex problems step by step, and retrieval-augmentation can effectively alleviate factual errors caused by outdated and unknown knowledge in LLMs. Recent works…

[103]
28 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20424153

Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) for domain-specific question-answering (QA) tasks by leveraging external knowledge sources. However, traditional RAG systems primarily focus on relevance-based retrieval and often struggle with redundancy, especially when reasoning requires…

[102]
28 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20423642

Abstract: Large language models (LLMs) have been increasingly used to interact with external environments (e.g., games, compilers, APIs) as goal-driven agents. However, it remains challenging for these language agents to quickly and efficiently learn from trial-and-error as traditional reinforcement learning methods require…

[101]
28 May 2026. Score: 1.17/10. Verification: L2, Source-grounded claims.

Abstract: Retrieval-Augmented Generation (RAG) methods enhance LLM performance by efficiently filtering relevant context for LLMs, reducing hallucinations and inference cost. However, most existing RAG methods focus on single-step retrieval, which is often insufficient for answering complex questions that require multi-step…

[100]
28 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20423344

Abstract: Despite their remarkable achievement, gigantic transformers encounter significant drawbacks, including exorbitant computational and memory footprints during training, as well as severe collapse evidenced by a high degree of parameter redundancy. Sparsely-activated Mixture-of-Experts (SMoEs) have shown promise to…

[99]
28 May 2026. Score: 3.00/10. Verification: L2, Source-grounded claims.

Abstract: Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for expert selection remain poorly understood. In this work, we introduce routing signatures, a vector representation summarizing expert activation…

[98]
28 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20421444

Abstract: Vision-language foundation models achieve promising performance in natural image classification, yet their direct application to medical imaging is limited by severe domain shifts, resolution mismatches, and the multi-label nature of clinical diagnosis. Training dedicated medical foundation models from scratch,…

[97]
28 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20421249

Abstract: Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for expert selection remain poorly understood. In this work, we introduce routing signatures, a vector representation summarizing expert activation…

[96]
28 May 2026. Score: 8.00/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20421241

Abstract: In many real-world applications, deployed models encounter inputs that differ from the data seen during training. Out-of-distribution detection identifies whether an input stems from an unseen distribution, while open-world recognition flags such inputs to ensure the system remains robust as ever-emerging, previously…

[95]
28 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20420842

Abstract: Fine-tuning Large Language Models (LLMs) is a common practice to adapt pre-trained models for specific applications. While methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multi-task scenarios. In contrast, Mixture-of-Expert…

[94]
28 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims.

Abstract: Mahalanobis distance (MD) is a simple and popular post-processing method for detecting out-of-distribution (OOD) inputs in neural networks. We analyze its failure modes for near-OOD detection and propose a simple fix called relative Mahalanobis distance (RMD) which improves performance and is more robust to…

« Prev 1 168 169 170 171 172 174 Next »