Papers
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does dynamic reward scaling perform relative to human-crafted unit tests in terms of code correctness and inference latency when evaluated on the HumanEval and SQuTR benchmarks using a fixed. Current large…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the impact of using dynamically scaled unit tests on the inference efficiency (e.g., FLOPs, latency) of LLMs during code generation, and how does it correlate with solution correctness in. Current large…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the multi-stage validation framework with progressively complex unit tests compare to human-written test suites (e.g., HumanEval, MBXP) in terms of reward signal accuracy and training. Current large…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the impact of varying the unit test complexity (e.g., simple vs. multi-step assertions) on the trade-off between inference efficiency and solution accuracy in dynamic reward scaling. Current large…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the scalability of DPO compare to RLHF in terms of training throughput when applied to large multimodal reasoning benchmarks like MMBench or SEED-Bench. This paper studies the alignment process of…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the impact of varying KL-constraint hyperparameters on the alignment performance of RLHF versus DPO when evaluated on corrupted image-text pairs from multimodal benchmarks like LLaVA-Bench. Aligning…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What is the impact of scaling the difficulty-based preference dataset size (e.g., 1K to 100K samples) on model performance on the GSM8K or MATH benchmarks, and how does this compare to scaling data. Aligning…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: Does integrating explicit rationales into preference data improve the consistency of alignment across different question difficulty tiers in the GSM8K mathematical reasoning benchmark compared to. Aligning…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: Does incorporating difficulty-based preference data selection with DPO lead to better alignment on code generation tasks (HumanEval, MBPP) compared to standard DPO, as measured by pass@1 score and. We introduce…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the impact of rationale-augmented preference optimization on the robustness of few-shot learners against syntactic adversarial attacks across different model scales. State-of-the-art few-shot learning…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does sliding window attention in code generation models affect token-level accuracy on long-context programming benchmarks compared to full attention. Transformers are quickly becoming one of the most heavily…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the scaling of node feature perturbations versus edge structure modifications affect the inference efficiency and error rates of graph neural networks in network intrusion detection tasks. Cybersecurity…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the computational efficiency of bypassing obfuscated gradients in GNN-based NIDS models scale with increasing network size, and what is the trade-off between robustness and inference time on. Machine…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How do obfuscated gradients in GNN-based NIDS models compare to other gradient masking techniques in terms of their robustness against structural adversarial attacks on the KDD Cup 99 dataset, as. The integration…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the comparative degradation in classification accuracy and robustness scores of GNN-based NIDS models under universal adversarial perturbations when trained on CIC-IDS2017 and evaluated on. Intrusion…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the cross-domain transferability of attack techniques that circumvent obfuscated gradients in GNN-based NIDS models when applied to other graph-based tasks, such as node classification in. Intrusion…
Abstract: This report synthesises findings from 3 peer-reviewed papers addressing the following research question: What is the impact of graph size and heterogeneity on the classification accuracy and convergence speed of multimodal versus unimodal GNNs, as measured on benchmarks such as the Open Graph Benchmark. It is a long…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How do multimodal graph neural networks compare to unimodal GNNs in terms of inference latency and memory efficiency when evaluated on large-scale heterogeneous graph benchmarks like PDNS-Net under. In this paper…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does multi-view aggregation in graph anomaly detection frameworks affect the F1 score and inference latency when scaling from single-edge devices to distributed edge computing environments. Machine learning…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the impact of quantization techniques on inference latency and F1 score for GNN-based anomaly detection models deployed on resource-constrained devices, and how does it compare to model. Unlike previous…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: Does incorporating reverse-KL regularization during RLHF training reduce performance degradation on multimodal reasoning tasks when evaluated on adversarially perturbed VQA datasets. Recently, ChatGPT, along with…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the throughput impact of integrating memory replay mechanisms into GNN-based anomaly detection systems, measured by inference latency and F1 score trade-offs on the UNSW-NB15 dataset. Given the…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How do multimodal models (e.g., CLIP or LXMERT) perform in continual learning scenarios compared to unimodal models, as measured by accuracy retention on sequential datasets like Visual Genome or. Multimodal…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does reverse-KL regularization in RLHF impact the robustness of vision-language models against adversarial perturbations on the VQA-Adv benchmark. In the last few years, the deep learning (DL) computing…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: To what extent does the reverse-KL regularized contextual bandit approach improve OCR accuracy under noisy input conditions compared to standard KL penalties on OCR-VQA. Concept drift primarily refers to an online…