Papers
Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does speculative decoding impact the vulnerability detection accuracy of Deepseek R1 on high cyclomatic complexity code compared to standard autoregressive decoding. 6 claims were extracted from source…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: To what extent do data augmentation strategies improve the generalization of deep learning models on small-scale datasets compared to transfer learning from large-scale pre-trained weights. 9 claims were…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What is the comparative memory footprint and inference latency of multi-task trained vision-language models versus single-task baselines on low-resource medical datasets. 10 claims were extracted from source…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the robustness of CNN architectures to adversarial perturbations compare when evaluated using structural similarity metrics versus standard accuracy on image classification benchmarks. 9 claims were…
Abstract: This report synthesises findings from 6 peer-reviewed papers addressing the following research question: Can sparse attention mechanisms improve the inference efficiency of large multimodal models on augmented medical image-text pairs, as measured by throughput and memory usage on the MM-Imagenet. 8 claims were…
Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How does curriculum-based multi-task learning affect the inference latency and accuracy of large multimodal models on sparse medical image-text pairs, as evaluated on the MedQA or R2D2 benchmarks. 8 claims were…
Abstract: This report synthesises findings from 3 peer-reviewed papers addressing the following research question: How does synthetic data augmentation impact the few-shot learning convergence rates of multimodal vision-language models on specialized medical imaging benchmarks. 0 claims were extracted from source literature; 0…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: To what extent does fine-tuning for adversarial robustness degrade the BLEU and ROUGE scores of Llama3 and Codestral when generating documentation for vulnerable code segments. 6 claims were extracted from source…
Abstract: This report synthesises findings from 5 peer-reviewed papers addressing the following research question: To what extent does the deterministic output of the MFOUR Vibe Framework improve the robustness of Codestral against adversarial code perturbations relative to standard sampling methods. 0 claims were extracted…
Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How does the MFOUR Vibe Framework impact the inference latency and throughput of Llama3 compared to baseline stochastic decoding in code generation benchmarks. 0 claims were extracted from source literature; 0…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the difference in robustness scores between Llama3 and Deepseek R1 when evaluated on adversarially perturbed code generation benchmarks. 9 claims were extracted from source literature; 8 were…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How do alignment techniques influence the trade-off between code generation accuracy and adversarial robustness in recent open-weight language models. 10 claims were extracted from source literature; 9 were…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the impact of reasoning-focused training on the jailbreak resistance of code-generating LLMs when evaluated on malware prompt datasets. 5 claims were extracted from source literature; 5 were independently…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the robustness of instruction-tuned Llama3 compare to Deepseek R1 against taxonomy-specific adversarial perturbations in code security benchmarks. 5 claims were extracted from source literature; 5 were…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the pass@1 performance of Codestral compare to Llama3 on HumanEval-X for low-resource programming languages when fine-tuned with 10\% of the original dataset. 8 claims were extracted from source…
Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How does alignment tuning in Llama3 and Deepseek R1 impact code generation accuracy on the LDOT benchmark compared to untuned baselines. 5 claims were extracted from source literature; 5 were independently…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: What is the comparative robustness of Llama3, Codestral, and Deepseek R1 in classifying vulnerabilities within the Big-Vul dataset under varying levels of code obfuscation. 7 claims were extracted from source…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: Can optimization techniques like speculative decoding mitigate the accuracy drop-off in Deepseek R1 when processing adversarial code with high cyclomatic complexity. 13 claims were extracted from source…
Abstract: This report synthesises findings from 3 peer-reviewed papers addressing the following research question: What is the effect of model quantization levels on the token throughput and fault classification performance of Llama3.1 and Mistral 7B when applied to battery management system datasets. 0 claims were extracted…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the impact of using R-squared as the primary evaluation metric on the robustness of regression-based machine learning models compared to traditional metrics like SMAPE and MAE in benchmark. 4 claims were…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: Can curriculum-based multi-task learning improve the inference efficiency and alignment stability of large multimodal models trained on augmented sparse medical image-text pairs. 9 claims were extracted from…
Abstract: This report synthesises findings from 2 peer-reviewed papers addressing the following research question: How do inference efficiency and detection accuracy trade-offs differ between Llama3 and Codestral when fine-tuned for adversarial robustness in C++ and Python vulnerability scanning. 11 claims were extracted from…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: What is the impact of multimodal pre-training (e.g., image-text models like FLAN-PaLM) on downstream code generation tasks, as evaluated by pass@1 and execution accuracy on HumanEval and MBPP. 9 claims were…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does multi-task learning with synthetic image augmentation affect the convergence speed and memory footprint of vision-language models on low-resource medical imaging benchmarks. 4 claims were extracted from…
Abstract: This report synthesises findings from 6 peer-reviewed papers addressing the following research question: What is the comparative robustness of Llama3 and Codestral against adversarial code perturbations in multilingual vulnerability detection tasks. 0 claims were extracted from source literature; 0 were independently…