Papers
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: Do self-supervised contrastive learning approaches for graph anomaly detection maintain higher AUC-ROC than supervised methods when trained on graphs with significant heterophily and missing features. Anomaly…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How do spectral-based graph anomaly detection methods compare to spatial GNN baselines in robustness when 20\% of node features are masked on heterophilic graphs. Anomaly detection is defined as discovering…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How do supervised GNN models compare to traditional methods in terms of robustness to adversarial attacks on graph-structured data in standardized GAD benchmarks. Anomaly detection is defined as discovering…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does knowledge distillation impact the zero-shot image-text retrieval accuracy of CLIP variants on Flickr30k and MSCOCO datasets. We present Distill CLIP (DCLIP), a fine-tuned variant of the CLIP model that…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the impact of varying graph densities on the detection accuracy of both supervised GNN models and traditional methods in standardized GAD benchmarks. Detecting anomalies in data is a vital task, with…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: To what extent do tabular foundation models maintain prediction accuracy compared to gradient boosting methods when evaluated on few-shot learning benchmarks with limited labeled rows. This study compared the…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the effect of counterfactual training on the inference efficiency and throughput of transformer-based VQA architectures under adversarial perturbations. Videos often capture objects, their visible…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the inference throughput of tabular foundation models compare to tree ensemble baselines on large-scale synthetic datasets with varying sparsity levels. Sentiment analysis of product reviews on…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the impact of synonym-based text augmentation on the calibration error of zero-shot multimodal models under out-of-distribution shifts. Since the establishment of vision-language foundation models as the…
Abstract: This report synthesises findings from 6 peer-reviewed papers addressing the following research question: How does counterfactual text augmentation impact the adversarial robustness accuracy of multimodal VQA models on the VQA-CP benchmark. In the task of Visual Question Answering (VQA), most state-of-the-art models…
Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How does CatBoost's performance on large-scale regression tasks compare to XGBoost and LightGBM in terms of accuracy and training time when evaluated on standard benchmark datasets like BigMart Sales. Decision…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does CatBoost's inference efficiency scale with dataset size compared to gradient boosting frameworks like TensorFlow Decision Forests and PyTorch Geometric when benchmarked on GPU accelerators. In this paper…
Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: How do zero-shot Visual Language Models like Flamingo compare to fine-tuned code-specific multimodal models in terms of accuracy on unseen CWE categories in benchmarks like CWESec and SARD. Abstract Data scarcity…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the impact of multi-objective optimization on code generation accuracy in HumanEval-JavaScript relative to standard PPO when training with diverse reward signals. The evolution of Large Language Models…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does multi-objective reinforcement learning affect pass@k scores on HumanEval-Java compared to single-objective PPO under varying user preference distributions. In the last 5 years there have been a large…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the comparative robustness of standard RLHF versus learned Q-shaping in maintaining pass@1 accuracy for LLMs when evaluating out-of-distribution Python code generation tasks from HumanEval. As Large…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the sensitivity of self-invoking code generation accuracy to variations in problem complexity when using multimodal models trained via supervised fine-tuning versus reinforcement learning. Deep learning…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: Does the reverse-KL regularized contextual bandit formulation improve robustness against reward hacking in multimodal alignment tasks compared to existing offline preference learning methods. Direct Preference…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the pass@k metric for Directional Preference Alignment compare to RLHF on the HumanEval benchmark when scaling model parameters from 13B to 175B. We introduce ChatGLM, an evolving family of large…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the inference throughput of Iterative Preference Learning under KL-constraints compare to standard DPO when generating code solutions on the HumanEval benchmark. Abstract The rapid evolution of large…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the inference efficiency trade-off between Directional Preference Alignment and RLHF for code generation tasks on the MBPP benchmark at 70B parameters. Abstract The rapid evolution of large language…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: Does Directional Preference Alignment reduce the variance in alignment scores compared to traditional reward modeling when evaluated on multimodal reasoning tasks involving code and natural language. Abstract The…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the impact of replacing explicit reward models with Directional Preference Alignment on the pass@k accuracy of code generation models across low-resource programming languages. Large Language Models…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the correlation between directional preference alignment training and code generation robustness against syntactic variations in multi-language HumanEval benchmarks. In recent years, deep learning (DL), a…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: Does directional preference alignment improve cross-lingual code generation consistency metrics between Java and JavaScript subsets in large language models. Pre-trained models for Natural Languages (NL) like…