Papers
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the inference efficiency (throughput, latency) of SLMs trained for CWE detection scale with model size when benchmarked on a private codebase, and how does this compare to larger models. Abstract Data…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the accuracy difference between SLMs and domain-adapted models on a multimodal benchmark (e.g., combining code and natural language descriptions) for CWE detection, and how does this vary. Building models…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the choice of activation functions for non-negative evidence constraints affect throughput and prediction reliability trade-offs in multimodal evidential networks. Brains, it has recently been argued,…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How does the performance of Llama3 and GRU-based imputation methods scale with increasing sequence length and noise levels in solar irradiation forecasting, measured by MAE and RMSE metrics on. The rapid…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the performance of Multi-Objective Reinforcement Learning (MORL) for preference alignment compare to single-objective methods in terms of HumanEval-JavaScript and HumanEval-Java pass@k. Abstract The…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the dynamic hot neuron threshold adjustment in PowerInfer impact the accuracy and inference latency of LLaMA-70B on the MBPP benchmark compared to static inference methods when deployed on. Abstract The…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does PowerInfer's dynamic hot neuron threshold adjustment compare to static inference methods in terms of throughput and memory efficiency when applied to LLaMA-70B on the HumanEval code. This paper…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the relative performance improvement of PowerInfer's adaptive inference strategy over static baselines for LLaMA-70B when evaluated on the MBPP benchmark with varying input sequence lengths. We introduce…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: Does Q-shaping maintain robustness in multimodal environments (e.g., VLMBench) when scaling to diverse tasks, and how does it compare to reward shaping in terms of accuracy-score trade-offs. Artificial…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: What is the impact of incorporating LLM-generated heuristics in Q-shaping on the inference throughput of PowerInfer when benchmarked on the HumanEval code generation task with multiple programming. Abstract The…
Abstract: This report synthesises findings from 1 peer-reviewed paper addressing the following research question: How robust is the Directional Preference Alignment framework to adversarial or edge-case inputs in code generation tasks compared to RLHF, as measured by accuracy on a curated subset of HumanEval. The remarkable…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the scalability of the Directional Preference Alignment framework compare to RLHF when applied to larger code generation benchmarks beyond HumanEval, such as MBPP or DS-1000, in terms of. Abstract The…
Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How does the Directional Preference Alignment framework perform in terms of inference efficiency and latency compared to traditional RLHF when generating code across multiple programming languages on. Abstract The…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: What is the impact of dynamic threshold adjustment (PowerInfer) on the PER metric for small language models (0.5-7B) compared to static thresholds in code generation and mathematical reasoning tasks. This article…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the PER metric correlate with memory-constrained deployment costs when comparing LLaMA-70B with smaller models (e.g., CodeGen-16B) on multi-task code generation benchmarks like HumanEval and. This…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the Performance-Efficiency Ratio (PER) compare across different model architectures (e.g., LLaMA vs. GPT vs. BLOOM) when evaluated on code generation tasks with varying input lengths. Multilayer neural…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the impact of varying the number of federated clients on the inference efficiency (throughput and latency) of the proposed malware detection model, as measured on edge devices using the. In this paper, we…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the federated malware detection model's robustness against adversarial poisoning attacks compare to other federated learning approaches (e.g., FedAvg, FedProx) when evaluated on the N-BaIoT. In this…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: What is the impact of varying client participation rates and data heterogeneity on the effectiveness of Byzantine attack mitigation strategies in federated learning-based malware detection frameworks. To deal with…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the impact of differential privacy techniques on the trade-off between malware detection accuracy and bandwidth utilization in federated learning models trained on the N-BaIoT dataset. In this article, we…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the impact of varying the number of federated learning rounds on the model performance (accuracy, F1-score) and communication efficiency (throughput, bandwidth usage) when training on N-BaIoT. In this…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the model accuracy of federated learning-based malware detection compare to centralized training on the N-BaIoT dataset when evaluated using precision, recall, and F1-score metrics. This work…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How do different client sampling strategies (e.g., random, stratified, adaptive) affect the trade-off between communication efficiency and model accuracy in federated malware detection systems with. Personalized…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the impact of heterogeneous client data distributions on the generalization performance of federated deep neural networks for malware classification, and how can model personalization. In federated…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How does the F1-score of federated malware detection models trained on N-BaIoT transfer to unseen IoT network traffic datasets (e.g., BoT-IoT, CIC-IoT-2021) under varying differential privacy noise. This work…