Assignee Research: Index of Papers

[284]

How do vision-language models like CLIP and MedSAM compare in anomaly localization accuracy on out-of-distribu

28 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20435299

Abstract: Deep convolutional neural networks have performed remarkably well on many Computer Vision tasks. However, these networks are heavily reliant on big data to avoid overfitting. Overfitting refers to the phenomenon when a network learns a function with very high variance such as to perfectly model the training data.…

[283]

How does the inference latency and throughput of AnyExperts' on-demand routing strategy compare to fixed routi

28 May 2026. Score: 6.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Modern datacenters increasingly rely on low-power, single-slot inference accelerators to balance performance, energy efficiency, and rack density constraints. The NVIDIA T4 GPU has become widely deployed due to strong performance per watt and mature software support. Its successor, the NVIDIA L4 GPU, introduces…

[282]

How does the throughput (tokens per second) of MixLoRA-based MoE fine-tuning scale with batch size and sequenc

28 May 2026. Score: 7.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[281]

What is the accuracy trade-off of expert bridging in MixLoRA versus standard LoRA on code generation tasks (Hu

28 May 2026. Score: 5.00/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: In the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching…

[280]

What is the computational overhead (inference latency and throughput) of applying RMD versus Mahalanobis dista

28 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20435243

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[279]

How does the inference latency and memory footprint of MixLoRA compare to full fine-tuning on the MMLU benchma

28 May 2026. Score: 9.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20435241

Abstract: Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (\<= 13B parameters), which…

[278]

To what extent does ReSSFormer's sparse attention pattern improve long-context generalization on NLVR2 over de

28 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models…

[277]

Can RMD's performance gains on near-OOD detection for vision models be replicated for multimodal models (e.g.,

28 May 2026. Score: 8.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20435222

Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional…

[276]

How does the relative Mahalanobis distance (RMD) method compare to ODIN and other post-hoc OOD detectors on LL

28 May 2026. Score: 7.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Out-of-distribution (OOD) detection is a rapidly growing field due to new robustness and security requirements driven by an increased number of AI-based systems. Existing OOD textual detectors often rely on an anomaly score (e.g., Mahalanobis distance) computed on the embedding output of the last layer of the…

[275]

Can ReSSFormer's architectural innovations (R2MU + adaptive sparsity) be transferred to a code generation benc

28 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20435208

Abstract: We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In…

[274]

How does the R2MU's bounded-depth recurrent reasoning compare to chain-of-thought prompting in terms of accura

28 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20435197

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[273]

What is the throughput improvement of Qwen3's thinking-mode routing over dense baselines on NLVR2 when control

28 May 2026. Score: 7.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models…

[272]

How does Qwen3's dynamic expert allocation affect inference efficiency on multi-hop reasoning compared to a fi

28 May 2026. Score: 7.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[271]

How does the latency-accuracy trade-off of Qwen3's dynamic expert specialization compare to fixed routing in M

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) networks promise favorable accuracy-compute trade-offs, yet practical vision deployments are hindered by expert collapse and limited end-to-end efficiency gains. We study when sparse top-\$k\$ routing with hard capacity constraints helps in vision classification, evaluated under multi-seed…

[270]

What is the inference throughput and memory efficiency trade-off of SMoES relative to dense baselines and hard

28 May 2026. Score: 1.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for expert selection remain poorly understood. In this work, we introduce routing signatures, a vector representation summarizing expert activation…

[269]

Does the layer-dependent expert specialization learned by SMoES generalize to out-of-distribution multimodal r

28 May 2026. Score: 1.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) Multimodal large language models (MLLMs) excel at vision-language tasks, but they suffer from high computational inefficiency. To reduce inference overhead, expert skipping methods have been proposed to deactivate redundant experts based on the current input tokens. However, we find that…

[268]

How does the routing accuracy and task performance of SMoES compare against top-k and token-choice routing str

28 May 2026. Score: 2.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Mixture-of-Experts (MoE) networks promise favorable accuracy-compute trade-offs, yet practical vision deployments are hindered by expert collapse and limited end-to-end efficiency gains. We study when sparse top-\$k\$ routing with hard capacity constraints helps in vision classification, evaluated under multi-seed…

[267]

For video diffusion transformers fine-tuned on high-resolution images, how does CAT's token allocation strateg

28 May 2026. Score: 6.13/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: We present a practical pipeline for fine-tuning open-source video diffusion transformers to synthesize cinematic scenes for television and film production from small datasets. The proposed two-stage process decouples visual style learning from motion generation. In the first stage, Low-Rank Adaptation (LoRA) modules…

[266]

What is the impact of domain shift in image complexity (e.g., from synthetic to natural scenes) on the accurac

28 May 2026. Score: 6.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: In Transformer architectures, tokenstextemdash discrete units derived from raw datatextemdash are formed by segmenting inputs into fixed-length chunks. Each token is then mapped to an embedding, enabling parallel attention computations while preserving the input's essential information. Due to the quadratic…

[265]

How does the inference throughput and token count scaling of adaptive token pruning (CAT-like) methods compare

28 May 2026. Score: 3.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: In this paper, we introduce PruneVid, a visual token pruning method designed to enhance the efficiency of multi-modal video understanding. Large Language Models (LLMs) have shown promising performance in video tasks due to their extended capabilities in comprehending visual modalities. However, the substantial…

[264]

How does content-adaptive tokenization affect the accuracy-efficiency trade-off on the MMMU and MathVista data

28 May 2026. Score: 2.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage,…

[263]

Does content-adaptive tokenization improve robustness and accuracy on the MMBench and MME benchmarks under var

28 May 2026. Score: 1.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals, and can be applied in various fields. In the medical field, LVLMs have a high potential to offer substantial assistance for diagnosis and treatment. Before that, it is crucial to develop…

[262]

Does applying cross-modal token pruning from vision to audio Transformers degrade robustness to noise perturba

28 May 2026. Score: 6.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Vision Transformers (ViTs) have achieved state-of-the-art performance across various computer vision tasks, but their high computational cost remains a challenge. Token pruning has been proposed to reduce this cost by selectively removing less important tokens. While effective in vision tasks by discarding non-object…

[261]

How does content-adaptive tokenization compare to fixed-patch tokenization in terms of inference throughput (t

28 May 2026. Score: 3.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: In this paper, we propose a novel approach to address the challenges of printed Urdu text recognition using high-resolution, multi-scale semantic feature extraction. Our proposed UTRNet architecture, a hybrid CNN-RNN model, demonstrates state-of-the-art performance on benchmark datasets. To address the limitations of…

[260]

To what extent does GraphMETRO's alignment mechanism improve robustness to distribution shift in multimodal mo

28 May 2026. Score: 2.33/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: We present a new zero-shot dense retrieval (ZeroDR) method, COCO-DR, to improve the generalization ability of dense retrieval by combating the distribution shifts between source training tasks and target scenarios. To mitigate the impact of document differences, COCO-DR continues pretraining the language model on the…