Papers
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How does the trade-off between model size and inference efficiency vary when distilling code generation capabilities from large language models to smaller models, as measured by latency and pass@k. Abstract The…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the impact of different GNN architectures (e.g., GCN, GAT, GraphSAGE) on the cross-domain generalization capability of GADT3 in graph anomaly detection tasks, as measured by accuracy and. In order to use…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How robust is the Mul-GAD framework to adversarial attacks on graph structures, and how does its robustness compare to other test-time training frameworks in terms of anomaly detection accuracy and. Machine…
Abstract: This report synthesises findings from 3 peer-reviewed papers addressing the following research question: How does INT4 quantization of LLaVA-UHD affect its performance on SEED-Bench compared to FP16 precision across different visual reasoning subtasks. Abstract In the past years, multimodal large language models…
Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: What is the impact of quantization-aware training on the inference latency and memory requirements of LLaVA-UHD when deployed on edge devices. Large foundation models, including large language models (LLMs),…
Abstract: This report synthesises findings from 6 peer-reviewed papers addressing the following research question: How robust is Mul-GAD's performance against adversarial attacks on graph structures compared to models like GAS and GCN-AE, as measured by anomaly detection accuracy on perturbed versions of the. Anomaly detection…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What is the impact of feature dimensionality reduction on GADT3's cross-domain anomaly detection performance on the ACM and DBLP graph benchmarks. Deep convolutional neural networks have performed remarkably well…
Abstract: This report synthesises findings from 6 peer-reviewed papers addressing the following research question: To what extent does domain adaptation in CLIP-TD improve cross-domain robustness compared to standard CLIP, as measured by accuracy on ImageNet-to-Sketchy and ImageNet-to-ClipArt domain adaptation. Multi-Task…
Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How does GADT3's homophily-guided self-supervision approach scale to billion-parameter LLMs on the Reddit and Twitter perturbed graph datasets. In the last few years, the deep learning (DL) computing paradigm has…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How does GADT3's test-time training framework compare to supervised GAD baselines in detecting anomalies on the Amazon and Yelp datasets when 20\% of node features are randomly masked. Cyber-attacks are becoming…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the impact of model distillation techniques on inference efficiency in CLIP-based vision-language models, measured by throughput and accuracy trade-offs on Flickr30k and MSCOCO benchmarks. Abstract The…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the performance of CLIP-TD compare to ALIGN in low-shot settings when evaluated on VQA and COCO text-to-image retrieval benchmarks. Natural Language Processing (NLP) is one of the most captivating…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the performance of Mul-GAD scale with increasing graph size and sparsity compared to other GNN-based semi-supervised anomaly detection models like GANomaly and DeepSVM when evaluated on. With a long…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the inference efficiency (throughput, latency) of SLMs trained for CWE detection scale with model size when benchmarked on a private codebase, and how does this compare to larger models. Abstract Data…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the accuracy difference between SLMs and domain-adapted models on a multimodal benchmark (e.g., combining code and natural language descriptions) for CWE detection, and how does this vary. Building models…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does the choice of activation functions for non-negative evidence constraints affect throughput and prediction reliability trade-offs in multimodal evidential networks. Brains, it has recently been argued,…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How does the performance of Llama3 and GRU-based imputation methods scale with increasing sequence length and noise levels in solar irradiation forecasting, measured by MAE and RMSE metrics on. The rapid…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the performance of Multi-Objective Reinforcement Learning (MORL) for preference alignment compare to single-objective methods in terms of HumanEval-JavaScript and HumanEval-Java pass@k. Abstract The…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the dynamic hot neuron threshold adjustment in PowerInfer impact the accuracy and inference latency of LLaMA-70B on the MBPP benchmark compared to static inference methods when deployed on. Abstract The…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does PowerInfer's dynamic hot neuron threshold adjustment compare to static inference methods in terms of throughput and memory efficiency when applied to LLaMA-70B on the HumanEval code. This paper…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the relative performance improvement of PowerInfer's adaptive inference strategy over static baselines for LLaMA-70B when evaluated on the MBPP benchmark with varying input sequence lengths. We introduce…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: Does Q-shaping maintain robustness in multimodal environments (e.g., VLMBench) when scaling to diverse tasks, and how does it compare to reward shaping in terms of accuracy-score trade-offs. Artificial…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: What is the impact of incorporating LLM-generated heuristics in Q-shaping on the inference throughput of PowerInfer when benchmarked on the HumanEval code generation task with multiple programming. Abstract The…
Abstract: This report synthesises findings from 1 peer-reviewed paper addressing the following research question: How robust is the Directional Preference Alignment framework to adversarial or edge-case inputs in code generation tasks compared to RLHF, as measured by accuracy on a curated subset of HumanEval. The remarkable…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does the scalability of the Directional Preference Alignment framework compare to RLHF when applied to larger code generation benchmarks beyond HumanEval, such as MBPP or DS-1000, in terms of. Abstract The…