Assignee Research: Index of Papers

[361]

How does varying LoRA rank in cross-attention layers affect LPIPS and FVD on UHD video benchmarks when fine-tu

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20436986

Abstract: We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained for 2D image synthesis have been turned into generative video models by inserting temporal layers and finetuning them on…

[360]

How does COCO-DR's continuous contrastive pretraining on target corpora affect zero-shot accuracy on BEIR benc

29 May 2026. Score: 3.17/10. Verification: L2, Source-grounded claims.

Abstract: In recent years, foundation models have become very popular due to their exceptional performance, mainly in natural language (NLP) tasks where they were first introduced. These models usually consist of hundreds of millions, or even billions, of parameters, making them resource-intensive during training and in…

[359]

What is the accuracy-efficiency trade-off of LogiPart's embedding-aware local LLM assignment versus GraphMETRO

29 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims.

Abstract: Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on…

[358]

How does LogiPart's hypothesis-first hierarchical partitioning compare to full-corpus LLM conditioning on per-

29 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20436908

Abstract: This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual entities and the high…

[357]

Do node-based BNNs with latent node variables maintain higher accuracy on CIFAR-10-C than weight-based BNNs wh

29 May 2026. Score: 8.33/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20436894

Abstract: Neural Architecture Search (NAS) has shown great success in automating the design of neural networks, but the prohibitive amount of computations behind current NAS methods requires further investigations in improving the sample efficiency and the network evaluation cost to get better results in a shorter time. In…

[356]

Does GraphMETRO's mixture-of-experts design improve cross-domain robustness on node classification tasks under

29 May 2026. Score: 5.00/10. Verification: L1, Literature synthesis.

Abstract: Multi-view graph refining-based clustering (MGRC) methods aim to facilitate the clustering of data via Graph Neural Networks (GNNs) by learning optimal graphs that reflect the underlying topology of the data. However, current MGRC approaches are limited by their disjoint two-stage process, where the graph structure…

[355]

How does expert caching efficiency and token scheduling in MoE diffusion models scale with increasing batch si

29 May 2026. Score: 5.00/10. Verification: L1, Literature synthesis.

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[354]

What is the impact of Lynx scheduling on expert load balancing and downstream QA accuracy for Mixtral 8x7B whe

29 May 2026. Score: 6.50/10. Verification: L2, Source-grounded claims.

Abstract: We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in…

[353]

What is the effect of expert diversity in GraphMETRO on out-of-distribution generalization accuracy on GQA spl

29 May 2026. Score: 5.00/10. Verification: L2, Source-grounded claims.

Abstract: Deep convolutional neural networks have performed remarkably well on many Computer Vision tasks. However, these networks are heavily reliant on big data to avoid overfitting. Overfitting refers to the phenomenon when a network learns a function with very high variance such as to perfectly model the training data.…

[352]

What is the impact of adversarial training on zero-shot learning model accuracy when using different modality

29 May 2026. Score: 6.67/10. Verification: L2, Source-grounded claims.

Abstract: Abstract The present discussion examines the transformative impact of Artificial Intelligence (AI) in educational settings, focusing on the necessity for AI literacy, prompt engineering proficiency, and enhanced critical thinking skills. The introduction of AI into education marks a significant departure from…

[351]

How does the accuracy of few-shot adapted medical VLMs correlate with the number of adaptation examples provid

29 May 2026. Score: 7.17/10. Verification: L2, Source-grounded claims.

Abstract: Current pre-trained vision-language models (PVLMs) achieve excellent performance on a range of multi-modal datasets.Recent work aims at building multilingual versions of such models, and a range of multilingual multimodal datasets have been introduced for this purpose.However, current PVLMs typically perform poorly…

[350]

How does ExpertFlow's predictive expert caching mechanism affect inference latency and memory usage compared t

29 May 2026. Score: 9.00/10. Verification: L2, Source-grounded claims.

Abstract: In this paper we report the set-up and results of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 conferences. Twenty state-of-the-art tumor segmentation algorithms were applied to a set of 65 multi-contrast MR scans of low- and high-grade glioma…

[349]

What is the inference throughput and memory cost trade-off for MoE-LLaVA under adversarial textual perturbatio

29 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20436797

Abstract: Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the…

[348]

How does the predictive expert caching latency and token scheduling overhead affect end-to-end tokens-per-seco

29 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20436792

Abstract: We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE)nVision-Language Models that significantly improves upon its predecessor,nDeepSeek-VL, through two key major upgrades. For the vision component, wenincorporate a dynamic tiling vision encoding strategy designed for processingnhigh-resolution…

[347]

What is the relationship between routing signature diversity and code generation accuracy on HumanEval benchma

29 May 2026. Score: 8.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20436788

Abstract: We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in…

[346]

How does expert specialization guided by soft modality signals influence task-specific performance gaps in VLM

29 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20436758

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[345]

What is the accuracy trade-off on the MMMU benchmark for MoE-LLaVA versus dense LLaVA models when expert cachi

29 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20436750

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family…

[344]

What is the impact of varying expert count and routing granularity in SMoES-based MoE-VLMs on throughput and V

29 May 2026. Score: 7.33/10. Verification: L2, Source-grounded claims.

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[343]

Does the accuracy gap on long-context multimodal benchmarks (e.g., Video-MME, Needle-in-a-Haystack) between Mo

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis.

Abstract: We study the continual pretraining recipe for scaling language models' context lengths to 128K, with a focus on data engineering. We hypothesize that long context modeling, in particular textit\the ability to utilize information at arbitrary input locations\, is a capability that is mostly already acquired through…

[342]

How do different gradient-based sampling methods affect the performance tradeoffs in code generation benchmark

28 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20436651

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[341]

How does the inference latency of the proposed DDRNet23-slim variant compare to baseline segmentation models o

28 May 2026. Score: 7.00/10. Verification: L1, Literature synthesis.

Abstract: Point clouds provide a flexible geometric representation suitable for countless applications in computer graphics; they also comprise the raw output of most 3D data acquisition devices. While hand-designed features on point clouds have long been proposed in graphics and vision, however, the recent overwhelming…

[340]

What is the impact of attention mechanism sparsity levels on segmentation accuracy for [drivable surface, obst

28 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20436621

Abstract: Automated driving systems (ADSs) promise a safe, comfortable and efficient driving experience. However, fatalities involving vehicles equipped with ADSs are on the rise. The full potential of ADSs cannot be realized unless the robustness of state-of-the-art is improved further. This paper discusses unsolved problems…

[339]

Can domain-generalized semantic segmentation models maintain >90\% mIoU performance when deployed on edge devic

28 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20436602

Abstract: Deep convolutional neural networks have performed remarkably well on many Computer Vision tasks. However, these networks are heavily reliant on big data to avoid overfitting. Overfitting refers to the phenomenon when a network learns a function with very high variance such as to perfectly model the training data.…

[338]

What is the impact of negative sampling versus domain-specific fine-tuning on exact match and F1 scores across

28 May 2026. Score: 7.00/10. Verification: L2, Source-grounded claims.

Abstract: Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models' pretraining (Radford et al., 2019). Can zero-shot generalization instead…

[337]

How does negative sampling performance compare to domain-specific fine-tuning when evaluated on out-of-domain

28 May 2026. Score: 3.83/10. Verification: L1, Literature synthesis.

Abstract: Detecting toxic content using language models is crucial yet challenging. While substantial progress has been made in English, toxicity detection in French remains underdeveloped, primarily due to the lack of culturally relevant, human-annotated, large-scale datasets. In this work, we release ToxiFrench, a dataset of…