Assignee Research: Index of Papers

[439]

What is the impact of model size reduction techniques on the effectiveness of music information retrieval syst

29 May 2026. Score: 7.23/10. Verification: L2, Source-grounded claims.

Abstract: A large amount of information exists in reviews written by users. This source of information has been ignored by most of the current recommender systems while it can potentially alleviate the sparsity problem and improve the quality of recommendations. In this paper, we present a deep model to learn item properties…

[438]

What is the impact of cross-view structural consistency modeling on the performance of UNAGI for multi-view cl

29 May 2026. Score: 9.23/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20439491

Abstract: In this paper we report the set-up and results of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 conferences. Twenty state-of-the-art tumor segmentation algorithms were applied to a set of 65 multi-contrast MR scans of low- and high-grade glioma…

[437]

What is the impact of quantized inference (4-bit vs 8-bit) on the throughput and task-specific F1 scores of co

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20439445

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[436]

How does the auxiliary-loss-free load balancing strategy in DeepSeek-V3 affect expert utilization diversity an

29 May 2026. Score: 7.20/10. Verification: L2, Source-grounded claims.

Abstract: We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in…

[435]

How does the accuracy of Gemini 1.5 Pro on the MMMU benchmark compare to MoE-LLaVA and dense LLaVA-1.5 when ev

29 May 2026. Score: 7.87/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20439435

Abstract: Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting…

[434]

What is the inference throughput (tokens per second) of Gemini 1.5 Flash versus LLaVA-NeXT on the Video-MME be

29 May 2026. Score: 8.00/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20439420

Abstract: Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine learning lifecycle, from training to deployment. However, the substantial advancements in versatility and performance these models offer come…

[433]

Can a mixture-of-experts (MoE) routing strategy with adaptive sparsity improve both throughput (FPS) and per-c

29 May 2026. Score: 3.67/10. Verification: L1, Literature synthesis.

Abstract: This review comprehensively investigates the current state and emerging trends of autonomous vehicle terrain detection and segmentation. By systematically reviewing literature from various databases, this study outlines the evolution of detection and segmentation techniques from traditional computer vision methods to…

[432]

What is the comparative performance drop in Video-MME accuracy for MoE models (e.g., Mixtral 8x22B) versus den

29 May 2026. Score: 1.67/10. Verification: L1, Literature synthesis.

Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69\% on MMLU and 8.38 on MT-bench), despite being…

[431]

What is the trade-off between mIoU and latency for DDRNet23-slim versus DeepLabV3+ on the RUGD dataset under v

29 May 2026. Score: 6.67/10. Verification: L2, Source-grounded claims.

Abstract: ABSTRACT Offroad autonomous vehicles (OAVs) are becoming increasingly popular for navigating challenging environments in agriculture, military, and exploration applications. These vehicles face unique challenges, such as unpredictable terrain, dynamic obstacles, and varying environmental conditions. Therefore, it is…

[430]

How does the choice between nucleus sampling and temperature scaling affect pass@k scores and character-level

29 May 2026. Score: 6.50/10. Verification: L2, Source-grounded claims.

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[429]

What is the trade-off between token sparsity levels (e.g., 10\%, 30\%, 50\%) in masked autoencoders and segmentat

29 May 2026. Score: 6.67/10. Verification: L2, Source-grounded claims.

Abstract: In recent years unmanned aerial vehicles (UAVs) have emerged as a popular and cost-effective technology to capture high spatial and temporal resolution remote sensing (RS) images for a wide range of precision agriculture applications, which can help reduce costs and environmental impacts by providing detailed…

[428]

What is the impact of domain-specific data augmentation on cross-modal alignment scores in multimodal LLMs eva

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20439332

Abstract: Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This…

[427]

Can data augmentation techniques improve few-shot learning performance of vision-language models on standard b

29 May 2026. Score: 5.67/10. Verification: L2, Source-grounded claims.

Abstract: In the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching…

[426]

How does the choice of attention mechanism (e.g., sparse vs. dense) in vision transformers affect mean Interse

29 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20439318

Abstract: Since the introduction of Vision Transformers, the landscape of many computer vision tasks (e.g., semantic segmentation), which has been overwhelmingly dominated by CNNs, recently has significantly revolutionized. However, the computational cost and memory requirement renders these methods unsuitable on the mobile…

[425]

What is the optimal negative sampling ratio for domain-agnostic QA performance across different model scales

29 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20439310

Abstract: Systematic reviews and meta-analyses are essential to summarize evidence relating to efficacy and safety of health care interventions accurately and reliably. The clarity and transparency of these reports, however, is not optimal. Poor reporting of systematic reviews diminishes their value to clinicians, policy…

[424]

To what extent does incorporating unanswerable questions through negative sampling techniques improve performa

29 May 2026. Score: 7.33/10. Verification: L2, Source-grounded claims.

Abstract: We present the results of the Machine Reading for Question Answering (MRQA) 2019 shared task on evaluating the generalization capabilities of reading comprehension systems. 1 In this task, we adapted and unified 18 distinct question answering datasets into the same format. Among them, six datasets were made available…

[423]

To what extent does the integration of secure multi-party computation protocols affect the zero-shot text clas

29 May 2026. Score: 4.00/10. Verification: L2, Source-grounded claims.

Abstract: Transformer models (e.g., Bert and GPT) have shown their dominance in machine learning tasks. Many cloud companies have begun to provide services based on Transformer models, examples include translation and text-speech conversion. However, such a service inevitably requires access to the client's data, which might…

[422]

What is the impact of privacy-preserving representations on inference latency when scaling model sizes from 7B

29 May 2026. Score: 6.50/10. Verification: L1, Literature synthesis.

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[421]

How does negative sampling ratio affect zero-shot question answering performance across different LLM architec

29 May 2026. Score: 2.33/10. Verification: L2, Source-grounded claims.

Abstract: To produce a domain-agnostic question answering model for the Machine Reading Question Answering (MRQA) 2019 Shared Task, we investigate the relative benefits of large pre-trained language models, various data sampling strategies, as well as query and context paraphrases generated by back-translation. We find a…

[420]

How does the cross-domain generalization performance of DeepSeek-R1 and o1-preview models vary when evaluated

29 May 2026. Score: 3.50/10. Verification: L1, Literature synthesis.

Abstract: In an era dominated by Large Language Models (LLMs), understanding their capabilities and limitations, especially in high-stakes fields like law, is crucial. While LLMs such as Meta's LLaMA, OpenAI's ChatGPT, Google's Gemini, DeepSeek, and other emerging models are increasingly integrated into legal workflows, their…

[419]

How do evidence gap identification mechanisms in FAIR-RAG systems affect F1 score performance on HotpotQA when

29 May 2026. Score: 4.67/10. Verification: L2, Source-grounded claims.

Abstract: While Retrieval-Augmented Generation (RAG) mitigates hallucination and knowledge staleness in Large Language Models (LLMs), existing frameworks often falter on complex, multi-hop queries that require synthesizing information from disparate sources. Current advanced RAG methods, employing iterative or adaptive…

[418]

What is the impact of test-time compute allocation on the inference efficiency and task completion accuracy of

29 May 2026. Score: 5.00/10. Verification: L2, Source-grounded claims.

Abstract: Recent advances in test-time scaling of large language models (LLMs), exemplified by DeepSeek-R1 and OpenAI's o1, show that extending the chain of thought during inference can significantly improve general reasoning performance. However, the impact of this paradigm on legal reasoning remains insufficiently explored.…

[417]

What is the impact of faithfulness constraints on LLM generation throughput measured in tokens per second acro

29 May 2026. Score: 6.07/10. Verification: L2, Source-grounded claims.

Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) for domain-specific question-answering (QA) tasks by leveraging external knowledge sources. However, traditional RAG systems primarily focus on relevance-based retrieval and often struggle with redundancy, especially when reasoning requires…

[416]

What is the impact of domain-specific fine-tuning on BEIR-NL datasets on downstream task performance measured

29 May 2026. Score: 2.00/10. Verification: L1, Literature synthesis.

Abstract: Since the inception of the Transformer architecture in 2017, Large Language Models (LLMs) such as GPT and BERT have evolved significantly, impacting various industries with their advanced capabilities in language understanding and generation. These models have shown potential to transform the medical field,…

[415]

To what extent does fine-tuning on BEIR-NL improve R@100 and MRR scores compared to zero-shot baselines for Du

29 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20438945

Abstract: Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data…