Papers
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does W4A4 quantization affect the HumanEval pass@1 score of Llama-2-7B compared to INT8 quantization while maintaining real-time inference latency on NVIDIA H100 GPUs. Reducing the latency and model size has…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How do CodeT5 models perform in cross-domain code completion tasks (e.g., Python to Java) compared to domain-specialized models, and what metrics (e.g., BLEU, accuracy) best capture these differences. Benchmark…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the trade-off between real-time vulnerability classification accuracy and throughput compare between CodeT5 models and other state-of-the-art code language models (e.g., CodeGen, CodeGPT). Many ML-based…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the robustness of Deepseek R1 to interference in LoRA-based fine-tuning compare to full fine-tuning when evaluated on the MBPP benchmark in terms of accuracy and latency. Recently, the instruction-tuning…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the scalability of CodeT5-based vulnerability detection in IDE environments when processing incremental code changes versus full-file analysis, as measured by latency per code edit and. In the rapidly…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the integration of CodeT5-based vulnerability detection into IDE environments compare to standalone processing in terms of token-level latency and GPU memory utilization when evaluated on. Texture…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: What is the impact of imperfect orthogonality in LoRA-based fine-tuning on the inference throughput of Codestral when evaluated on the HumanEval benchmark. Pre-training Large Language Models (LLMs) on web-scale…
Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How does the choice of spreading factor (SF) in LoRa modulation affect the F1-score performance of Llama3 on QuixBugs when using QLoRA fine-tuning compared to full fine-tuning. Pre-training Large Language Models…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the trade-off between inference efficiency (latency, throughput) and F1-score performance for Llama3, Codestral, and Deepseek R1 when deployed for vulnerability detection across multiple. This study…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the choice of fine-tuning hyperparameters impact the cross-language generalization performance of Llama3, Codestral, and Deepseek R1 on Big-Vul, as measured by F1-score gaps between seen and. Anomaly…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the diversity-weight parameter in Vendi-RAG influence the alignment of FLAN-T5-xl outputs with human preferences on the TruthfulQA benchmark compared to BM25 retrieval. Retrieval-augmented generation…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the impact of varying the diversity-weight parameter in Vendi-RAG on the accuracy of FLAN-T5-xl for code generation tasks in the HumanEval benchmark compared to BM25 retrieval. Retrieval-augmented…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: Does the choice between full-graph and mini-batch training pipelines affect the robustness of Graph Neural Networks against adversarial perturbations in control flow graphs used for security analysis. Malware…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: To what extent do system-level optimizations for mini-batch GNN training improve inference throughput and memory efficiency when deploying multimodal vulnerability detectors on resource-constrained. Graph Neural…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the memory consumption and latency of DeepSeek R1 and Codestral compare when using IceCache's KV-cache management against traditional on-GPU KV-cache in autoregressive generation tasks with. Key-Value…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does mini-batch training impact the convergence speed and final accuracy of Graph Neural Networks for code vulnerability detection compared to full-graph training on large-scale software. Full-graph and…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the trade-off between throughput and accuracy when deploying DeepSeek R1 and Codestral with IceCache's external vector database-based KV-cache on resource-constrained hardware for code. Key-Value (KV)…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the robustness of vulnerability detection models trained on code property graphs with integrated commit messages vary against adversarial code perturbations compared to models using only. Deep Neural…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What is the effect of using convolutional neural networks versus graph neural networks on the inference efficiency and detection accuracy when processing code property graphs augmented with commit. The increasing…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does incorporating natural language commit messages into code property graph representations impact the F1-score of vulnerability detection models on the Big-Vul dataset compared to graph-only. A commit…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: To what extent does training on synthetic data from Claude 2 improve the cross-domain robustness of small language models on adversarial NLI datasets compared to ChatGPT-3.5-Turbo. Natural Language Inference…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the inference latency of large pre-trained video encoders change when fine-tuned on synthetic gesture data versus human-annotated datasets across varying batch sizes. In this work, we explore the…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: To what extent does training on synthetic video data impact the zero-shot cross-domain generalization accuracy of multimodal video-language models compared to models trained on real-world annotations. In this…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the impact of synthetic data source quality on the inference efficiency and throughput of small language models trained for natural language inference tasks. The evolution of Generative Pre-trained…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the reasoning accuracy of small language models on NLI benchmarks change when fine-tuned on synthetic data from ChatGPT-3.5-Turbo compared to data from ChatGPT-4. Large Language Models (LLMs) have…