Papers
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How does the few-shot learning accuracy of Llama 3.1 compare to Mistral 7B on time-series forecasting benchmarks when restricted to low-rank adaptation fine-tuning. 9 claims were extracted from source literature;…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: Evaluating the inference efficiency of Llama3.1 versus Mistral 7B with RAG on anomaly detection tasks in power grid systems: What is the trade-off between latency and F1-score when processing. 7 claims were…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: Comparison of Llama3.1 and Mistral 7B in power grid anomaly detection: How does fine-tuning on battery datasets affect their robustness (F1-score) on downstream tasks when integrated with. 0 claims were extracted…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How do Codestral-7B and Codestral-70B compare in terms of false positive rates and tokens-per-second efficiency when evaluating smart contract vulnerabilities under high-concurrency inference. 9 claims were…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What is the correlation between model scale and false positive rates in Llama3 variants when performing vulnerability detection on OWASP benchmark tasks under adversarial perturbations. 11 claims were extracted…
Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: What is the correlation between batch size scaling and latency degradation for Codestral models when performing static analysis code classification. 7 claims were extracted from source literature; 7 were…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does instruction fine-tuning on domain-specific security datasets impact the inference efficiency and detection accuracy trade-off for Llama3-7B and Llama3-70B on obfuscated code samples. 12 claims were…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the inference latency of Deepseek R1 scale relative to code structural complexity during vulnerability scanning on the Big-Vul benchmark. 0 claims were extracted from source literature; 0 were…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the robustness of Llama3-7B versus Llama3-70B to synthetic code obfuscation vary across different vulnerability classes in the SARD dataset when measured by F1-score degradation. 14 claims were extracted…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does Deepseek R1's vulnerability detection accuracy on Big-Vul correlate with cyclomatic complexity metrics compared to Llama3 and Codestral. 0 claims were extracted from source literature; 0 were…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the impact of variable renaming and control flow flattening on the F1 scores of Llama3 versus Codestral when evaluated on the Big-Vul dataset. 15 claims were extracted from source literature; 3 were…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the quantization of Deepseek R1 impact its throughput and false positive rate when classifying CVEs in the Big-Vul benchmark. 16 claims were extracted from source literature; 5 were independently…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the accuracy of Deepseek R1 in vulnerability classification vary across different programming languages when evaluated on a standardized dataset like Big-Vul. 11 claims were extracted from source…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the effect of code-specific data augmentation on the pass@1 scores of code generation models across diverse programming language datasets. 9 claims were extracted from source literature; 2 were…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does curriculum-based multi-task learning affect the cross-domain generalization accuracy of large multimodal models on the RadNet benchmark compared to standard joint training. 12 claims were extracted from…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the impact of curriculum-based multi-task learning on the inference latency and throughput of large multimodal models evaluated on the RadNet medical image-text dataset. 13 claims were extracted from…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What is the impact of replacing fully connected CRF post-processing with attention-based refinement modules on Dice coefficient scores for brain tumor segmentation. 0 claims were extracted from source literature;…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How do transformer-based architectures scale in terms of GPU memory efficiency versus accuracy compared to hybrid CNN-CRF models when processing multi-modal MRI volumes. 0 claims were extracted from source…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the inference latency of KANs compare to traditional MLPs when evaluated on the HellaSwag reasoning benchmark for language models. 0 claims were extracted from source literature; 0 were independently…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the robustness of KANs against adversarial attacks compare to MLPs when measured using the FGSM attack success rate on the CIFAR-10 dataset. 16 claims were extracted from source literature; 2 were…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the accuracy gap between KANs and transformers on the ImageNet-1K benchmark when trained with identical computational budgets. 9 claims were extracted from source literature; 0 were independently verified…
Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does the inference latency of 3D CNN-CRF hybrids compare to Vision Transformer variants on high-resolution 3D medical imaging benchmarks. 5 claims were extracted from source literature; 5 were independently…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does varying batch size during adversarial training impact the F1 score of Codestral on syntax-perturbed MBPP benchmarks compared to standard training methods. 0 claims were extracted from source literature;…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: Does adversarial training with different batch sizes improve the cross-domain generalization of Codestral as measured by accuracy on unseen code generation benchmarks like HumanEval. 8 claims were extracted from…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the correlation between parameter-efficient fine-tuning methods and the retention of multi-language code synthesis capabilities measured by pass@1 on MultiPL-E. 10 claims were extracted from source…