Assignee Research: Index of Papers

[538]

What is the inference latency trade-off between Llama3-70B, Codestral-34B, and Deepseek R1-7B when deployed in

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441341

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[537]

How does cross-domain security taxonomy alignment affect the vulnerability detection performance of Llama3, Co

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441339

Abstract: Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This burgeoning field has captured significant interest from both academic researchers and…

[536]

How does the zero-shot cross-domain retrieval performance of MMICL compare to specialized multimodal models on

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441333

Abstract: Strong Artificial Intelligence (Strong AI) or Artificial General Intelligence (AGI) with abstract reasoning ability is the goal of next-generation AI. Recent advancements in Large Language Models (LLMs), along with the emerging field of Multimodal Large Language Models (MLLMs), have demonstrated impressive…

[535]

Does applying the PowerInfer optimization strategy to LLaVA result in measurable degradation in VQA accuracy s

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441325

Abstract: Abstract In the past years, multimodal large language models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering and visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in…

[534]

What is the difference in token generation throughput (tokens/sec) between PowerInfer and standard vLLM infere

29 May 2026. Score: 9.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441311

Abstract: This article surveys Cognitive Edge Computing as a practical and methodical pathway for deploying reasoning-capable Large Language Models (LLMs) and autonomous AI agents on resource-constrained devices at the network edge. We present a unified, cognition-preserving framework spanning: (1) model optimization…

[533]

How does the memory footprint of PowerInfer compare to dense inference methods when running LLaVA-1.5 on NVIDI

29 May 2026. Score: 8.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441308

Abstract: Recently, ChatGPT, along with DALL-E-2 and Codex,has been gaining significant attention from society. As a result, many individuals have become interested in related resources and are seeking to uncover the background and secrets behind its impressive performance. In fact, ChatGPT and other Generative AI (GAI)…

[532]

How does dynamic adjustment of the hot neuron threshold in PowerInfer affect Pass@1 scores on the HumanEval be

29 May 2026. Score: 7.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This paper introduces PowerInfer, a high-speed Large Language Model (LLM) inference engine on a personal computer (PC) equipped with a single consumer-grade GPU. The key principle underlying the design of PowerInfer is exploiting the high locality inherent in LLM inference, characterized by a power-law distribution…

[531]

To what extent does zero-shot reasoning accuracy degrade in Llama3, Codestral, and Deepseek R1 when forecastin

29 May 2026. Score: 5.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: The field of artificial intelligence has undergone a revolution from foundational Transformer architectures to reasoning-capable systems approaching human-level performance. We present LLMOrbit, a comprehensive circular taxonomy navigating the landscape of large language models spanning 2019-2025. This survey…

[530]

What is the difference in F1-scores between Llama3, Codestral, and Deepseek R1 for multi-language vulnerabilit

29 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441299

Abstract: Recent advances in Code Large Language Models (CodeLLMs) have primarily focused on open-ended code generation, often overlooking the crucial aspect of code understanding and reasoning. To bridge this gap, we introduce CodeMMLU, a comprehensive multiple-choice benchmark designed to evaluate the depth of software and…

[529]

What is the correlation between varying hot neuron selection thresholds in PowerInfer and token generation thr

29 May 2026. Score: 9.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441293

Abstract: Small language models (SLMs), despite their widespread adoption in modern smart devices, have received significantly less academic attention compared to their large language model (LLM) counterparts, which are predominantly deployed in data centers and cloud environments. While researchers continue to improve the…

[528]

How does integrating TAE into Qwen-VL impact retrieval latency and Recall@K scores on the COCO-Captioning subs

29 May 2026. Score: 7.60/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441277

Abstract: Android applications are developing rapidly across the mobile ecosystem, but Android malware is also emerging in an endless stream. Many researchers have studied the problem of Android malware detection and have put forward theories and methods from different perspectives. Existing research suggests that machine…

[527]

How does the F1-score of Llama3-70B compare to Codestral-7B on the CodeT5 benchmark for code understanding tas

29 May 2026. Score: 7.63/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441271

Abstract: Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This burgeoning field has captured significant interest from both academic researchers and…

[526]

What is the impact of cross-domain fine-tuning on security-specific code corpora (e.g., CWE datasets) on the i

29 May 2026. Score: 9.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441265

Abstract: The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention.…

[525]

How does incorporating identifier-aware tokenization (e.g., CodeT5's approach) affect the zero-shot performanc

29 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441253

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[524]

To what extent does fine-tuning Deepseek R1 on JavaScript-specific vulnerability patterns improve its detectio

29 May 2026. Score: 7.30/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Large language models (LLMs) have demonstrated significant potential in various tasks, including those requiring human-level intelligence, such as vulnerability detection. However, recent efforts to use LLMs for vulnerability detection remain preliminary, as they lack a deep understanding of whether a subject LLM's…

[523]

What is the correlation between TAE token misalignment thresholds and inference latency throughput on Baichuan

29 May 2026. Score: 7.40/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: The prevalence of depression may be affected by changes in psychiatric practices and the availability of online mental health information in the past two decades. This study aimed to evaluate the aggregate prevalence of depression in communities from different countries between 1994 and 2014 and to explore the…

[522]

Does the optimal token misalignment threshold for maximizing alignment safety differ between Baichuan 2 and Vi

29 May 2026. Score: 9.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441244

Abstract: Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive…

[521]

How does the performance gap between Code Llama Python and the general foundation model vary across specific D

29 May 2026. Score: 8.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have shown that LLMs can solve tasks using programs like human developers, the…

[520]

To what extent does the inference throughput of Gemini 1.5 Flash degrade when processing million-token context

29 May 2026. Score: 7.30/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter…

[519]

To what extent does specializing Code Llama for Python impact its zero-shot functional correctness on non-Pyth

29 May 2026. Score: 8.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441210

Abstract: We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of…

[518]

What is the impact of increasing shot count from one to five on the edit distance scores of Code Llama Python

29 May 2026. Score: 9.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441201

Abstract: Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field. In this paper, we present the first attempt to use language-only GPT-4 to generate multimodal language-image…

[517]

How does the vulnerability classification accuracy of code-specific fine-tuned Llama3 compare to Gemini 1.5 Fl

29 May 2026. Score: 8.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441189

Abstract: Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This burgeoning field has captured significant interest from both academic researchers and…

[516]

How does the performance of MMICL's zero-shot image-text retrieval compare to Flamingo, PaLI, and BLIVA on the

29 May 2026. Score: 9.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441157

Abstract: Since the resurgence of deep learning, vision-language models (VLMs) enhanced by large language models (LLMs) have grown exponentially in popularity. However, while LLMs can utilize extensive background knowledge and task information with in-context learning, most VLMs still struggle with understanding complex…

[515]

How does the alignment of Llama3, Codestral, and Deepseek R1 with security-specific fine-tuning (e.g., SecLM)

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441155

Abstract: As Large Language Models (LLMs) become increasingly integrated into secure software development workflows, a critical question remains unanswered: can these models not only detect insecure code but also reliably classify vulnerabilities according to standardized taxonomies? In this work, we conduct a systematic…

[514]

How does the performance of syntax-aware text preprocessing vary across Llama3, Codestral, and Deepseek R1 whe

29 May 2026. Score: 8.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20441149

Abstract: Mobile-edge computing (MEC) is an emerging paradigm to meet the ever-increasing computation demands from mobile applications. By offloading the computationally intensive workloads to the MEC server, the quality of computation experience, e.g., the execution latency, could be greatly improved. Nevertheless, as the…