Assignee Research: Index of Papers

[513]

To what extent does fine-tuning on multimodal instruction data reduce the performance gap between in-domain an

29 May 2026. Score: 9.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441140

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[512]

How does the zero-shot cross-domain retrieval accuracy of MMICL on TextCaps compare to its in-domain performan

29 May 2026. Score: 7.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Abstract Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances from a large number of predefined categories in natural images. Deep learning techniques have emerged as a powerful strategy for learning feature representations directly from data and…

[511]

How does the PowerInfer approach scale in terms of memory efficiency and throughput when applied to multi-moda

29 May 2026. Score: 6.30/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: The field of efficient Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges. Although the field has expanded and is vibrant, there hasn't been a concise framework that analyzes the various methods of LLM Inference to provide a clear understanding of this…

[510]

What is the comparative performance gap in F1-scores between Llama3, Codestral, and Deepseek R1 for multi-lang

29 May 2026. Score: 8.07/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441126

Abstract: This systematic literature review comprehensively examines the application of Large Language Models (LLMs) in forecasting and anomaly detection, highlighting the current state of research, inherent challenges, and prospective future directions. LLMs have demonstrated significant potential in parsing and analyzing…

[509]

How does the inference latency per token and throughput of PowerInfer compare to other state-of-the-art sparse

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441111

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[508]

What is the impact of varying the hot neuron selection threshold in PowerInfer on the trade-off between accura

29 May 2026. Score: 9.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441105

Abstract: This paper introduces PowerInfer, a high-speed Large Language Model (LLM) inference engine on a personal computer (PC) equipped with a single consumer-grade GPU. The key principle underlying the design of PowerInfer is exploiting the high locality inherent in LLM inference, characterized by a power-law distribution…

[507]

Can the TAE method be extended to multimodal LLMs (e.g., Qwen-VL) without significant accuracy degradation, an

29 May 2026. Score: 6.60/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Mohammad Mahdi Abootorabi, Amirhosein Zobeiri, Mahdi Dehghani, Mohammadali Mohammadkhani, Bardia Mohammadi, Omid Ghahroodi, Mahdieh Soleymani Baghshah, Ehsaneddin Asgari. Findings of the Association for Computational Linguistics: ACL 2025. 2025.

[506]

To what extent does targeted code preprocessing improve the generalization capability of Deepseek R1 for vulne

29 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441095

Abstract: The rapid advancement of Large Language Models (LLMs) has opened up new opportunities for leveraging artificial intelligence in a variety of application domains, including cybersecurity. As the volume and sophistication of cyber threats continue to grow, there is an increasing need for intelligent systems that can…

[505]

How does cross-domain fine-tuning on security-specific code corpora affect the F1-score of Llama3 and Codestra

29 May 2026. Score: 9.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441091

Abstract: Pre-trained models for Natural Languages (NL) like BERT and GPT have been recently shown to transfer well to Programming Languages (PL) and largely benefit a broad set of code-related tasks. Despite their success, most current methods either rely on an encoder-only (or decoder-only) pre-training that is suboptimal…

[504]

What is the impact of varying token misalignment thresholds in TAE on downstream task performance (e.g., MMLU,

29 May 2026. Score: 8.73/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441089

Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across a broad spectrum of tasks. They have attracted significant attention and been deployed in numerous downstream applications. Nevertheless, akin to a double-edged sword, LLMs also present potential risks. They could suffer from private data…

[503]

How does domain-specific code preprocessing affect the vulnerability detection accuracy of Llama3, Codestral,

29 May 2026. Score: 4.73/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: As Large Language Models (LLMs) become increasingly integrated into secure software development workflows, a critical question remains unanswered: can these models not only detect insecure code but also reliably classify vulnerabilities according to standardized taxonomies? In this work, we conduct a systematic…

[502]

How does the Python specialization (Code Llama - Python) compare to the general Code Llama foundation models i

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441069

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[501]

How does the inference efficiency of Qwen2.5 and Gemini 1.5 Pro scale with batch size (e.g., 1, 4, 16) on the

29 May 2026. Score: 6.33/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: The high computational and memory requirements of large language model (LLM) inference make it feasible only with multiple high-end accelerators. Motivated by the emerging demand for latency-insensitive tasks with batched processing, this paper initiates the study of high-throughput LLM inference using limited…

[500]

How does the instruction-following capability (Code Llama - Instruct) of different model sizes (7B vs 34B vs 7

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441034

Abstract: We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of…

[499]

To what extent do code-specific transformations improve the consistency scores of fine-tuned Llama3, Codestral

29 May 2026. Score: 8.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441008

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family…

[498]

What is the comparative impact of syntax-aware text preprocessing on the false positive rates of Llama3, Codes

29 May 2026. Score: 9.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441000

Abstract: Abstract The rapid development of large language models (LLMs) has opened new avenues across various fields, including cybersecurity, which faces an evolving threat landscape and demand for innovative technologies. Despite initial explorations into the application of LLMs in cybersecurity, there is a lack of a…

[497]

How does MMICL's zero-shot image-text retrieval accuracy on MSCOCO and Flickr30K compare to Flamingo, PaLI, an

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20440984

Abstract: Since the resurgence of deep learning, vision-language models (VLMs) enhanced by large language models (LLMs) have grown exponentially in popularity. However, while LLMs can utilize extensive background knowledge and task information with in-context learning, most VLMs still struggle with understanding complex…

[496]

To what extent does MMICL improve cross-domain generalization in zero-shot retrieval tasks when evaluated on o

29 May 2026. Score: 7.63/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20440982

Abstract: In the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching…

[495]

What is the inference latency and throughput trade-off for LLaMA models of varying sizes (7B to 70B) when eval

29 May 2026. Score: 6.50/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: This paper introduces PowerInfer, a high-speed Large Language Model (LLM) inference engine on a personal computer (PC) equipped with a single consumer-grade GPU. The key principle underlying the design of PowerInfer is exploiting the high locality inherent in LLM inference, characterized by a power-law distribution…

[494]

Does cross-domain fine-tuning (e.g., pre-training on general code vs. security-focused code) combined with tar

29 May 2026. Score: 7.40/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Large Language Models (LLMs) have significantly impacted numerous domains, including Software Engineering (SE). Many recent publications have explored LLMs applied to various SE tasks. Nevertheless, a comprehensive understanding of the application, effects, and possible limitations of LLMs on SE is still in its early…

[493]

How does the Baichuan 2 model's performance in low-resource inference settings compare to Meta AI's LLaMA-3 on

29 May 2026. Score: 7.30/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Intervening the internal activations of large language models (LLMs) provides an effective inference-time alignment approach to mitigate undesirable behaviors, such as generating erroneous or harmful content, thereby ensuring safe and reliable applications of LLMs.However, previous methods neglect the misalignment…

[492]

What is the impact of model size scaling (7B vs 34B vs 70B) on Code Llama's performance in code infilling task

29 May 2026. Score: 7.17/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of…

[491]

How does the inclusion of domain-specific text preprocessing (e.g., code-specific transformations) impact the

29 May 2026. Score: 8.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20440910

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[490]

How does the performance of Flamingo compare to PaLI and BLIVA in zero-shot cross-modal retrieval tasks, parti

29 May 2026. Score: 8.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20440902

Abstract: Since the resurgence of deep learning, vision-language models (VLMs) enhanced by large language models (LLMs) have grown exponentially in popularity. However, while LLMs can utilize extensive background knowledge and task information with in-context learning, most VLMs still struggle with understanding complex…

[489]

What is the quantitative trade-off between DPO alignment and tokens-per-second inference speed on code generat

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20440893

Abstract: Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This burgeoning field has captured significant interest from both academic researchers and…