Assignee Research: Index of Papers

[563]

How does the accuracy-throughput trade-off of Llama3-70B and Codestral-34B compare when deployed on heterogene

29 May 2026. Score: 4.17/10. Verification: L2, Source-grounded claims.

Abstract: This paper proposes a neural architecture search (NAS) method for split computing. Split computing is an emerging machine-learning inference technique that addresses the privacy and latency challenges of deploying deep learning in IoT systems. In split computing, neural network models are separated and cooperatively…

[562]

How does the precision-recall tradeoff in Gemini 1.5 Pro with an 8M context window compare to Llama3-70B with

29 May 2026. Score: 3.50/10. Verification: L2, Source-grounded claims.

Abstract: Considerable delays often exist between the discovery of a vulnerability and the issue of a patch. One way to mitigate this window of vulnerability is to use a configuration workaround, which prevents the vulnerable code from being executed at the cost of some lost functionality – but only if one is available. Since…

[561]

When fine-tuned on domain-specific security corpora, how do Llama3 and Code Llama 7B compare in few-shot (5-15

29 May 2026. Score: 5.57/10. Verification: L2, Source-grounded claims.

Abstract: We propose a meta learning framework for detecting anomalies in human language across diverse domains with limited labeled data. Anomalies in language ranging from spam and fake news to hate speech pose a major challenge due to their sparsity and variability. We treat anomaly detection as a few shot binary…

[560]

What is the impact of incorporating multimodal context (e.g., UML diagrams or execution traces) on the CWE cla

29 May 2026. Score: 6.50/10. Verification: L2, Source-grounded claims.

Abstract: Large Language Models (LLMs) have demonstrated significant capabilities in understanding and analyzing code for security vulnerabilities, such as Common Weakness Enumerations (CWEs). However, their reliance on cloud infrastructure and substantial computational requirements pose challenges for analyzing sensitive or…

[559]

How does the inference efficiency (throughput, latency) of SecLM-fine-tuned Llama3, Codestral, and Deepseek R1

29 May 2026. Score: 5.07/10. Verification: L2, Source-grounded claims.

Abstract: Large language models (LLMs) such as GPT-4o and Claude Sonnet 4.5 have demonstrated strong capabilities in open-ended reasoning and generative language tasks, leading to their widespread adoption across a broad range of NLP applications. However, for structured text classification problems with fixed label spaces,…

[558]

How does the anomaly detection F1-score of Deepseek R1 compare to Codestral on time-series datasets with distr

29 May 2026. Score: 7.00/10. Verification: L2, Source-grounded claims.

Abstract: The field of artificial intelligence has undergone a revolution from foundational Transformer architectures to reasoning-capable systems approaching human-level performance. We present LLMOrbit, a comprehensive circular taxonomy navigating the landscape of large language models spanning 2019-2025. This survey…

[557]

What is the impact of quantization techniques on the accuracy of LLaVA-1.5 in multimodal tasks when using Powe

29 May 2026. Score: 9.00/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20441554

Abstract: The field of efficient Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges. Although the field has expanded and is vibrant, there hasn't been a concise framework that analyzes the various methods of LLM Inference to provide a clear understanding of this…

[556]

What is the percentage drop in zero-shot forecasting accuracy for Llama3 when evaluated on cross-domain time-s

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20441548

Abstract: Rapid developments in large language models (LLMs) have created new opportunities for their use in the energy sector, from forecasting renewable energy to power system operation and energy market analysis. These models help improve decision-making, anomaly detection, and optimization procedures in intricate energy…

[555]

How does varying the hot neuron activation threshold in PowerInfer impact Pass@1 scores on the HumanEval bench

29 May 2026. Score: 8.33/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20441546

Abstract: This paper introduces PowerInfer, a high-speed Large Language Model (LLM) inference engine on a personal computer (PC) equipped with a single consumer-grade GPU. The key principle underlying the design of PowerInfer is exploiting the high locality inherent in LLM inference, characterized by a power-law distribution…

[554]

How does the inference latency and memory footprint of lightweight BERT models compare to random forest classi

29 May 2026. Score: 7.80/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20441507

Abstract: In the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching…

[553]

How do Llama3, Codestral, and Deepseek R1 compare in F1-scores for cross-language vulnerability classification

29 May 2026. Score: 8.67/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20441501

Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this…

[552]

What is the impact of dynamic hot neuron threshold adjustment in PowerInfer on the pass@1 accuracy of LLaMA-70

29 May 2026. Score: 9.17/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20441499

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[551]

What is the impact of using graph neural networks versus traditional machine learning classifiers on detection

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims.

Abstract: Abstract The growing importance of data security in modern information systems extends beyond the preventing malicious software and includes the critical topic of data privacy. Centralized data processing in traditional machine learning methods presents significant challenges, including greater risk of data breaches…

[550]

What is the correlation between TAE token misalignment thresholds and code generation accuracy in Vicuna-13B v

29 May 2026. Score: 8.00/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20441475

Abstract: This survey addresses the crucial issue of factuality in Large Language Models (LLMs). As LLMs find applications across diverse domains, the reliability and accuracy of their outputs become vital. We define the Factuality Issue as the probability of LLMs to produce content inconsistent with established facts. We…

[549]

Do multimodal model benchmarks show different sensitivity to TAE token misalignment thresholds in Baichuan 2 c

29 May 2026. Score: 9.17/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20441473

Abstract: The propensity score is the probability of treatment assignment conditional on observed baseline characteristics. The propensity score allows one to design and analyze an observational (nonrandomized) study so that it mimics some of the particular characteristics of a randomized controlled trial. In particular, the…

[548]

How does the pass@k metric of Llama3-70B compare to Codestral-7B on the HumanEval benchmark when fine-tuned on

29 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20441461

Abstract: Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This burgeoning field has captured significant interest from both academic researchers and…

[547]

What is the variation in execution success rates between specialized code LLMs and general-purpose LLMs when g

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20441455

Abstract: Deep convolutional neural networks have performed remarkably well on many Computer Vision tasks. However, these networks are heavily reliant on big data to avoid overfitting. Overfitting refers to the phenomenon when a network learns a function with very high variance such as to perfectly model the training data.…

[546]

To what extent does instruction complexity in BigCodeBench correlate with the performance degradation of code

29 May 2026. Score: 7.40/10. Verification: L2, Source-grounded claims.

Abstract: Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This burgeoning field has captured significant interest from both academic researchers and…

[545]

How does the inference throughput of Qwen3-MoE models compare to dense Qwen3 models when processing multilingu

29 May 2026. Score: 9.17/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20441398

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[544]

How does the pass@1 score of Code Llama Python compare to general foundation models on BigCodeBench tasks requ

29 May 2026. Score: 8.07/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20441396

Abstract: Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This burgeoning field has captured significant interest from both academic researchers and…

[543]

Does the Python specialization in Code Llama lead to improved instruction-following performance in non-Python

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20441384

Abstract: We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of…

[542]

How does the edit distance performance of Code Llama Python compare to Code Llama 7B when increasing shot coun

29 May 2026. Score: 8.23/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20441373

Abstract: As Large Language Models (LLMs) become increasingly integrated into secure software development workflows, a critical question remains unanswered: can these models not only detect insecure code but also reliably classify vulnerabilities according to standardized taxonomies? In this work, we conduct a systematic…

[541]

How does the in-domain performance of MMICL on MSCOCO compare to its performance on other standard object dete

29 May 2026. Score: 8.67/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20441364

Abstract: This survey presents a comprehensive analysis of the phenomenon of hallucination in multimodal large language models (MLLMs), also known as Large Vision-Language Models (LVLMs), which have demonstrated significant advancements and remarkable abilities in multimodal tasks. Despite these promising developments, MLLMs…

[540]

How do context window expansions in Gemini 1.5 Pro affect precision and recall rates for multi-file vulnerabil

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20441362

Abstract: Large Language Models (LLMs) have garnered remarkable advancements across diverse code-related tasks, known as Code LLMs, particularly in code generation that generates source code with LLM from natural language descriptions. This burgeoning field has captured significant interest from both academic researchers and…

[539]

To what extent does multimodal input (code + AST graphs) improve the vulnerability reasoning capabilities of S

29 May 2026. Score: 8.50/10. Verification: L2, Source-grounded claims. 10.5281/zenodo.20441350

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate…