Papers
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the performance of Mul-GAD compare to other semi-supervised graph anomaly detection models on the Reddit and Twitter datasets in terms of precision, recall, and F1-score. Anomaly detection is defined as…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does the inference throughput of LLaVA-UHD compare to LLaVA-1.5-7B and LLaVA-1.5-13B when processing 4K images on MMBench, and how does this scalability impact latency per token in. Visual encoding constitutes…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How do different LLaVA model versions compare in terms of quantization-aware training effectiveness on standard multimodal reasoning benchmarks like VQA and GQA. Recent advances in multimodal vision-language…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: To what extent does few-shot prompting enable Llama3 to match the RMSE of domain-specific transformers like Temporal Fusion Transformers on unseen renewable energy datasets. Short-term load forecasting (STLF) is…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the robustness of Llama3 to missing data points in high-frequency solar power sequences compare to GRU-based imputation methods. The energy output a photo voltaic(PV) panel is a function of solar…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How do different model architectures handle the trade-off between accuracy and inference speed on the PiSAR benchmark when using identical hardware and power consumption limits. Evidential deep learning, built…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the pass@1 accuracy of fine-tuned LLaMA-70B on MBPP Python function synthesis compare to CodeGen and CodeLlama under identical dynamic hot neuron threshold configurations in PowerInfer. Large Language…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the inference latency of Llama3 compare to optimized LSTM architectures when performing minute-level time-series forecasting on edge devices. The deployment of transformer-based models on…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: To what extent does fine-tuning on domain-specific adversarial examples improve the generalization accuracy of SLMs on out-of-distribution code samples from different programming paradigms. We introduce…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the robustness of code-trained SLMs degrade under adversarial perturbations specifically designed to evade CWE detection, and how does this compare to domain-adapted models fine-tuned on. Large Language…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How robust is the Multi-Objective Reinforcement Learning approach for preference alignment in maintaining consistent performance scores across different code generation tasks in the. This paper addresses the…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does incorporating explicit rationales in preference datasets affect the code generation accuracy of LLaMA-70B on the MBPP benchmark compared to standard comparison-based alignment. Aligning language models…
Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: Does preference divergence in human evaluation scores decrease when aligning multimodal models using data-centric rationales versus traditional reinforcement learning from human feedback. Aligning language models…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the impact of rationale-augmented direct preference alignment on the inference latency and throughput of large language models during dynamic threshold adjustment. Large language models (LLMs) based on…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How do various threshold policies (dynamic vs. fixed) for LLaMA-70B under the PowerInfer framework compare in terms of memory efficiency and end-to-end latency on the HumanEval code generation. Understanding and…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the relative inference latency improvement of PowerInfer's dynamic hot neuron threshold adjustment compared to static baselines for LLaMA-70B on the same MBPP Python function synthesis. This investigation…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the inference efficiency impact of multi-objective reward optimization on PowerInfer's throughput when scaling to diverse programming languages beyond Python. Q-shaping is an extension of Q-value…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How does the Directional Preference Alignment framework compare to traditional RLHF in terms of code generation accuracy and preference alignment effectiveness when evaluated on the HumanEval. Fine-grained control…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the Performance-Efficiency Ratio (PER) metric correlate with actual deployment costs when comparing LLaMA-70B inference with PowerInfer's dynamic threshold adjustment versus fixed threshold. Large…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How do unsupervised federated models compare to supervised approaches in terms of detection accuracy and false positive rates when deployed on resource-constrained IoT devices with limited training. This work…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the robustness of federated malware detection models against adversarial perturbations compare to centralized models when evaluated on the N-BaIoT dataset with simulated poisoning attacks. This work…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How do different aggregation algorithms (FedAvg, FedProx, FedNova) affect the robustness of federated malware detection systems against Byzantine attacks on IoT networks when measured by model. This work…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: To what extent do unsupervised federated learning approaches maintain detection precision-recall performance when adapting to new malware families not present in the original N-BaIoT training set. This work…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the communication efficiency of federated learning-based malware detection models compare to centralized training on N-BaIoT dataset when measured by convergence speed and bandwidth. This work…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What is the cross-domain generalization performance of supervised federated models trained on N-BaIoT when evaluated on unseen IoT device traffic from different manufacturers using F1-score and AUC. This work…