Assignee Research: Index of Papers

[351]

How does the accuracy of few-shot adapted medical VLMs correlate with the number of adaptation examples provid

29 May 2026. Score: 7.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Current pre-trained vision-language models (PVLMs) achieve excellent performance on a range of multi-modal datasets.Recent work aims at building multilingual versions of such models, and a range of multilingual multimodal datasets have been introduced for this purpose.However, current PVLMs typically perform poorly…

[350]

How does ExpertFlow's predictive expert caching mechanism affect inference latency and memory usage compared t

29 May 2026. Score: 9.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: In this paper we report the set-up and results of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 conferences. Twenty state-of-the-art tumor segmentation algorithms were applied to a set of 65 multi-contrast MR scans of low- and high-grade glioma…

[349]

What is the inference throughput and memory cost trade-off for MoE-LLaVA under adversarial textual perturbatio

29 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20436797

Abstract: Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the…

[348]

How does the predictive expert caching latency and token scheduling overhead affect end-to-end tokens-per-seco

29 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20436792

Abstract: We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE)nVision-Language Models that significantly improves upon its predecessor,nDeepSeek-VL, through two key major upgrades. For the vision component, wenincorporate a dynamic tiling vision encoding strategy designed for processingnhigh-resolution…

[347]

What is the relationship between routing signature diversity and code generation accuracy on HumanEval benchma

29 May 2026. Score: 8.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20436788

Abstract: We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in…

[346]

How does expert specialization guided by soft modality signals influence task-specific performance gaps in VLM

29 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20436758

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[345]

What is the accuracy trade-off on the MMMU benchmark for MoE-LLaVA versus dense LLaVA models when expert cachi

29 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20436750

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family…

[344]

What is the impact of varying expert count and routing granularity in SMoES-based MoE-VLMs on throughput and V

29 May 2026. Score: 7.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[343]

Does the accuracy gap on long-context multimodal benchmarks (e.g., Video-MME, Needle-in-a-Haystack) between Mo

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: We study the continual pretraining recipe for scaling language models' context lengths to 128K, with a focus on data engineering. We hypothesize that long context modeling, in particular textit\the ability to utilize information at arbitrary input locations\, is a capability that is mostly already acquired through…

[342]

How do different gradient-based sampling methods affect the performance tradeoffs in code generation benchmark

28 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20436651

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[341]

How does the inference latency of the proposed DDRNet23-slim variant compare to baseline segmentation models o

28 May 2026. Score: 7.00/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Point clouds provide a flexible geometric representation suitable for countless applications in computer graphics; they also comprise the raw output of most 3D data acquisition devices. While hand-designed features on point clouds have long been proposed in graphics and vision, however, the recent overwhelming…

[340]

What is the impact of attention mechanism sparsity levels on segmentation accuracy for [drivable surface, obst

28 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20436621

Abstract: Automated driving systems (ADSs) promise a safe, comfortable and efficient driving experience. However, fatalities involving vehicles equipped with ADSs are on the rise. The full potential of ADSs cannot be realized unless the robustness of state-of-the-art is improved further. This paper discusses unsolved problems…

[339]

Can domain-generalized semantic segmentation models maintain >90\% mIoU performance when deployed on edge devic

28 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20436602

Abstract: Deep convolutional neural networks have performed remarkably well on many Computer Vision tasks. However, these networks are heavily reliant on big data to avoid overfitting. Overfitting refers to the phenomenon when a network learns a function with very high variance such as to perfectly model the training data.…

[338]

What is the impact of negative sampling versus domain-specific fine-tuning on exact match and F1 scores across

28 May 2026. Score: 7.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models' pretraining (Radford et al., 2019). Can zero-shot generalization instead…

[337]

How does negative sampling performance compare to domain-specific fine-tuning when evaluated on out-of-domain

28 May 2026. Score: 3.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Detecting toxic content using language models is crucial yet challenging. While substantial progress has been made in English, toxicity detection in French remains underdeveloped, primarily due to the lack of culturally relevant, human-annotated, large-scale datasets. In this work, we release ToxiFrench, a dataset of…

[336]

How do simple negative sampling techniques compare to advanced data augmentation methods for improving out-of-

28 May 2026. Score: 3.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: To produce a domain-agnostic question answering model for the Machine Reading Question Answering (MRQA) 2019 Shared Task, we investigate the relative benefits of large pre-trained language models, various data sampling strategies, as well as query and context paraphrases generated by back-translation. We find a…

[335]

What is the impact of back-translation paraphrasing techniques on QA model generalization across different mod

28 May 2026. Score: 5.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: To produce a domain-agnostic question answering model for the Machine Reading Question Answering (MRQA) 2019 Shared Task, we investigate the relative benefits of large pre-trained language models, various data sampling strategies, as well as query and context paraphrases generated by back-translation. We find a…

[334]

Can ASE framework maintain consistent accuracy scores while scaling inference budget across diverse legal doma

28 May 2026. Score: 5.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Large language models (LLMs) have shown remarkable skills across various activities, including text generation and code synthesis. Their widespread applicability, however, raises substantial concerns about security, privacy, and possibly misuse. Of recent legislative efforts, the most notable is the proposed EU AI…

[333]

How does negative sampling performance vary across different LLM architectures (7B vs 70B) when evaluated on o

28 May 2026. Score: 5.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: The complexity of multimedia applications in terms of intensity of computation and heterogeneity of treated data led the designers to embark them on multiprocessor systems on chip. The complexity of these systems on one hand and the expectations of the consumers on the other hand complicate the designers job to…

[332]

How does the token-efficiency trade-off (accuracy per inference cost) vary between DeepSeek-R1 and o1-preview

28 May 2026. Score: 6.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Despite increasing discussions on open-source Artificial Intelligence (AI), existing research lacks a discussion on the transparency and accessibility of state-of-the-art (SoTA) Large Language Models (LLMs). The Open Source Initiative (OSI) has recently released its first formal definition of open-source software.…

[331]

What is the throughput (queries per second) trade-off for dense retrievers (e.g., Contriever) on MuSiQue 2-hop

28 May 2026. Score: 6.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5\% and 17.0\%, respectively, which is considerably better than the previous…

[330]

How does retrieval-augmented code generation latency compare to end-to-end generation on HumanEval benchmark u

28 May 2026. Score: 5.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[329]

How does the cross-lingual performance of DeepSeek-R1 and o1-preview vary across different legal sub-domains w

28 May 2026. Score: 9.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20435959

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[328]

What is the impact of evidence gap identification mechanisms in FAIR-RAG on downstream task performance measur

28 May 2026. Score: 6.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: The advent of Large Language Models (LLMs) has revolutionized Natural Language Processing, yet their application in high-stakes, specialized domains like religious question answering is hindered by challenges like hallucination and unfaithfulness to authoritative sources. This issue is particularly critical for the…

[327]

To what extent does fine-tuning on BEIR-NL improve downstream task performance in Dutch legal and news domains

28 May 2026. Score: 6.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Zero-shot evaluation of information retrieval (IR) models is often performed using BEIR; a large and heterogeneous benchmark composed of multiple datasets, covering different retrieval tasks across various domains. Although BEIR has become a standard benchmark for the zero-shot setup, its exclusively English content…