Papers
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How do defense-free federated learning frameworks compare to Byzantine-robust aggregators like Krum or Median in terms of test accuracy on CIFAR-10 under 20\% and 40\% label-flipping poisoning rates. Federated…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: What is the effect of model scaling on the inference efficiency and certification bounds of provably secure federated learning defenses against label-flipping attacks. Due to its distributed nature, federated…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: How does heterogeneity in edge device compute capabilities affect the robustness and false positive rates of federated deep learning intrusion detection systems using non-IID data distributions. Federated Learning…
Abstract: This report synthesises findings from 6 peer-reviewed papers addressing the following research question: How do quantization-aware aggregation strategies in federated learning impact the inference latency and accuracy of transformer-based intrusion detection models on resource-constrained edge devices. This work…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How do multimodal soft prompt attacks (e.g., combining text and image embeddings) affect the robustness of alignment in open-source multimodal models like LLaVA, compared to text-only attacks. Although multimodal…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How does dynamic knowledge messenger capacity scaling in federated learning impact model convergence speed and inference efficiency in distributed code generation tasks, as measured by. Medical AI faces challenges…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: How do different federated learning aggregation strategies (e.g., FedAvg, FedProx, SCAFFOLD) perform in terms of robustness to non-IID data distributions and model alignment when integrated with. Over-the-air…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How robust are current LMRMs to adversarial perturbations in wireless signal-sensing alignment tasks, as quantified by accuracy degradation metrics under controlled adversarial conditions. Pre-trained…
Abstract: This report synthesises findings from 5 peer-reviewed papers addressing the following research question: What is the effect of varying the number of participating IoT clients in a federated learning setup on the convergence rate and communication efficiency when using adaptive aggregation rules for. This work…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does domain-specific fine-tuning on legal corpora affect the zero-shot performance of Baichuan-2 on the LegalBench benchmark compared to models fine-tuned on general domains. Realizing the recent advances in…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: What is the impact of Gated Sparse Attention on perplexity and generation quality (measured by ROUGE-L and BLEU scores) in long-context code generation tasks when compared to dense attention and. The computational…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: How does incorporating dynamic facial affect representations impact the accuracy and robustness of multimodal code generation models when evaluated against benchmarks with evolving user preferences. Automated…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does MQuant's post-training quantization compare to other inference optimization techniques (e.g., pruning, distillation) in terms of throughput and accuracy on the LLaVA benchmark. We consider the problem of…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the impact of full static quantization on the reasoning capabilities of multimodal large language models, as measured by accuracy on the LaVIS benchmark suite. Multimodal large language models (MLLMs)…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the incorporation of dynamic problem difficulty adaptation in self-invoking code generation benchmarks (e.g., HumanEval Pro) affect model robustness when compared to static difficulty. We introduce…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the comparative efficiency gain in inference time and accuracy when using branch-aware preference alignment versus uniform alignment in self-invoking code generation tasks on MBPP Pro, as. We introduce…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How do multimodal LLMs perform on self-invoking code generation tasks (HumanEval Pro) compared to text-only models when evaluated on problems requiring multi-step reasoning, measured by exact match. We introduce…
Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: Does interleaving RoI features with language embeddings improve cross-domain generalization performance on unbseen visual grounding datasets like RefCOCOg compared to global image feature fusion. In the era of…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the correlation between bit-width reduction in vision-language models and the degradation of grounding performance on RefCOCO+ versus the gain in inference throughput. Benchmark accuracy is often…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does activation-aware quantization affect zero-shot visual grounding accuracy on RefCOCO+ compared to standard post-training quantization for multimodal large language models. Quantization is one of the most…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How does multi-objective reward alignment compare to scalar-reward RLHF in terms of inference latency and throughput when evaluated on the DS-1000 code generation benchmark. Q-shaping is an extension of Q-value…
Abstract: This report synthesises findings from 5 peer-reviewed papers addressing the following research question: How does Reward-Guided Speculative Decoding (RSD) compare to standard speculative decoding in terms of inference throughput and output quality on the DS-1000 code generation benchmark. Speculative decoding (SD)…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What is the impact of incorporating non-verbal feedback signals (e.g., facial expression likelihood) alongside verbal follow-up likelihood as reward signals on the pass@k metrics for multimodal code. In natural…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How robust are Code Llama variants with expanded context windows to syntax perturbations in cross-library API generation, as measured by pass@1 on BigCodeBench. Statistical language modeling and translation with…
Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What is the impact of mixed precision inference (FP16/INT8) on tokens-per-second throughput for aligned code generation models compared to unaligned baselines on the HumanEval and MBPP benchmarks. We introduce…