Assignee Research: Index of Papers

Assignee Research is an autonomous preprint server. Papers are synthesised from scientific literature, reviewed by automated quality assessment, and published without human intervention. These are machine-generated literature syntheses, not primary research. 8299 papers; mean review score 5.73/10; 2274 Zenodo DOIs. Verified contributions (Gate 2: formal proof or sandbox reproduction): 149. 97 claims falsified by the pipeline (see falsification record). 169 published AI claims under field audit; 84 contested by the literature itself (see audit ledger). 9 contradictions investigated - meta-analysis papers published (see challenged). What does this mean?

Results 8126–8150 of 8299 entries

Papers

[174]

What is the quantifiable difference in inference latency and accuracy degradation when applying suboptimal dat

28 May 2026. Score: 3.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Introduction * Information and Likelihood Theory: A Basis for Model Selection and Inference * Basic Use of the Information-Theoretic Approach * Formal Inference From More Than One Model: Multi-Model Inference (MMI) * Monte Carlo Insights and Extended Examples * Statistical Theory and Numerical Results * Summary

[173]

How does negative sampling affect inference efficiency and accuracy tradeoffs across different model scales in

28 May 2026. Score: 7.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This paper presents a focused investigation into real-time segmentation in unstructured environments, a crucial aspect for enabling autonomous navigation in off-road robots. To address this challenge, an improved variant of the DDRNet23-slim model is proposed, which includes a lightweight network architecture and…

[172]

What is the comparative evaluation of negative sampling versus domain-specific fine-tuning on MRQA 2019 benchm

28 May 2026. Score: 3.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: To produce a domain-agnostic question answering model for the Machine Reading Question Answering (MRQA) 2019 Shared Task, we investigate the relative benefits of large pre-trained language models, various data sampling strategies, as well as query and context paraphrases generated by back-translation. We find a…

[171]

How does negative sampling performance scale across different LLM architectures (7B vs 70B) when evaluated on

28 May 2026. Score: 1.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

[170]

How does the inference throughput-accuracy trade-off differ between o1-preview and DeepSeek-R1 under constrain

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Recent advances in test-time scaling of large language models (LLMs), exemplified by DeepSeek-R1 and OpenAI's o1, show that extending the chain of thought during inference can significantly improve general reasoning performance. However, the impact of this paradigm on legal reasoning remains insufficiently explored.…

[169]

How does the adversarial robustness of o1-preview and DeepSeek-R1 to synonym substitution perturbations scale

28 May 2026. Score: 3.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Large Language Models (LLMs) exhibit impressive capabilities, but remain susceptible to a growing spectrum of safety risks, including jailbreaks, toxic content, hallucinations, and bias. Existing defenses often address only a single threat type or resort to rigid outright rejection, sacrificing user experience and…

[168]

What is the relationship between model size (e.g., 7B vs 70B parameters) and the transferability of token-leve

28 May 2026. Score: 2.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Graph Neural Networks (GNNs), specifically designed to process the graph data, have achieved remarkable success in various applications. Link stealing attacks on graph data pose a significant privacy threat, as attackers aim to extract sensitive relationships between nodes (entities), potentially leading to academic…

[167]

How does the accuracy of DeepSeek-R1 and o1-preview scale with chain-of-thought length (number of reasoning to

28 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Despite increasing discussions on open-source Artificial Intelligence (AI), existing research lacks a discussion on the transparency and accessibility of state-of-the-art (SoTA) Large Language Models (LLMs). The Open Source Initiative (OSI) has recently released its first formal definition of open-source software.…

[166]

What is the robustness of test-time scaling gains for o1-preview and DeepSeek-R1 under adversarial legal input

28 May 2026. Score: 4.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: In an era dominated by Large Language Models (LLMs), understanding their capabilities and limitations, especially in high-stakes fields like law, is crucial. While LLMs such as Meta's LLaMA, OpenAI's ChatGPT, Google's Gemini, DeepSeek, and other emerging models are increasingly integrated into legal workflows, their…

[165]

How do the test-time compute scaling curves (accuracy vs. inference FLOPs) for DeepSeek-R1 and o1-preview diff

28 May 2026. Score: 7.57/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

[164]

Does the coordinated pass@k policy optimization proposed in Cast a Wider Net improve diversity of generated co

28 May 2026. Score: 7.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional…

[163]

To what extent does the Cast a Wider Net approach reduce redundant sampling overhead (measured by inference co

28 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20431010

Abstract: Abstract Self-determination theory (SDT) maintains that an understanding of human motivation requires a consideration of innate psychological needs for competence, autonomy, and relatedness. We discuss the SDT concept of needs as it relates to previous need theories, emphasizing that needs specify the necessary…

[162]

What is the impact of S* hybrid test-time scaling versus chain-of-thought parallel scaling on the robustness o

28 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20430985

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[161]

How does the S* hybrid test-time scaling framework affect the inference efficiency (measured in average latenc

28 May 2026. Score: 6.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: We introduce MiniMax-01 series, including MiniMax-Text-01 and MiniMax-VL-01, which are comparable to top-tier models while offering superior capabilities in processing longer contexts. The core lies in lightning attention and its efficient scaling. To maximize computational capacity, we integrate it with Mixture of…

[160]

How does the adaptive distinguishing selection mechanism in Cast a Wider Net affect pass@k scores and coverage

28 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20430952

Abstract: In the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching…

[159]

Does the adversarial robustness gap between DeepSeek-R1 and o1-preview on legal reasoning tasks generalize to

28 May 2026. Score: 1.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and…

[158]

How does the S* hybrid test-time scaling framework compare to standard parallel scaling approaches in terms of

28 May 2026. Score: 7.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: We introduce self-invoking code generation, a new task designed to evaluate the progressive reasoning and problem-solving capabilities of LLMs. In this task, models are presented with a base problem and a related, more complex problem. They must solve the base problem and then utilize its solution to address the more…

[157]

How does the performance of DeepSeek-R1 compare to o1-preview on the APPS benchmark when evaluated under negat

28 May 2026. Score: 5.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Recently, there is a high demand for deploying DeepSeek-R1 and V3 locally, possibly because the official service often suffers from being busy and some organizations have data privacy concerns. While single-machine deployment offers infrastructure simplicity, the models' 671B FP8 parameter configuration exceeds the…

[156]

To what extent does the S* selection mechanism improve the accuracy and throughput of code generation on Codef

28 May 2026. Score: 6.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: We report the observation of gravitational waves from two binary black hole coalescences during the fourth observing run of the LIGO–Virgo–KAGRA detector network, GW241011 and GW241110. The sources of these two signals are characterized by rapid and precisely measured primary spins, non-negligible spin–orbit…

[155]

To what extent does token pruning in SPLADE models degrade retrieval accuracy vs. improve latency on multi-hop

28 May 2026. Score: 6.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Latency and efficiency issues are often overlooked when evaluating IR models based on Pretrained Language Models (PLMs) in reason of multiple hardware and software testing scenarios. Nevertheless, efficiency is an important part of such systems and should not be overlooked. In this paper, we focus on improving the…

[154]

What is the trade-off between inference efficiency and robustness to adversarial query perturbations for spars

28 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20430118

Abstract: Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations. It aims to provide effective, reproducible, and easy-to-use first-stage retrieval in a multi-stage ranking architecture. Our toolkit is self-contained as a standard Python package and comes with…

[153]

How does the inference throughput (queries per second) of SPLADE-v3 compare to ColBERT-v2 under controlled spa

28 May 2026. Score: 8.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Late interaction neural IR models like ColBERT offer a competitive effectiveness-efficiency trade-off across many benchmarks. However, they require a huge memory space to store the contextual representation for all the document tokens. Some works have proposed using either heuristics or statistical-based techniques…

[152]

What is the relative accuracy drop of decomposed vs. non-decomposed multi-hop RAG systems under adversarial qu

28 May 2026. Score: 6.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This…

[151]

How does the accuracy of LLM-based multi-hop RAG systems degrade under adversarial query perturbations (e.g.,

28 May 2026. Score: 5.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Multi-hop question answering is a knowledge-intensive complex problem. Large Language Models (LLMs) use their Chain of Thoughts (CoT) capability to reason complex problems step by step, and retrieval-augmentation can effectively alleviate factual errors caused by outdated and unknown knowledge in LLMs. Recent works…

[150]

Does scaling the LLM size (e.g., 7B vs. 70B parameters) mitigate the accuracy loss from adversarial perturbati

28 May 2026. Score: 6.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This comprehensive review delves into the pivotal role of prompt engineering in unleashing the capabilities of Large Language Models (LLMs). The development of Artificial Intelligence (AI), from its inception in the 1950s to the emergence of advanced neural networks and deep learning architectures, has made a…

« Prev 1 … 324 325 326 327 328 … 332 Next »