Papers
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the efficiency tradeoff between 7B and 13B InternVL models in terms of inference latency and memory usage when deployed on edge devices with quantized weights. Quantized neural networks are well known for…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the sample efficiency and convergence speed of reinforcement learning-based VLN models trained on RxR compare to those trained on R2R when scaling instruction complexity and language. This report…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the correlation between prompt length and task completion accuracy for Embodied-R1 compared to smaller VLA models in embodied navigation tasks. This paper develops LongNav-R1, an end-to-end multi-turn…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does the task completion accuracy of small-scale 3B multimodal policies scale relative to 7B and 13B models when faced with increasing instruction complexity in embodied navigation environments. This paper…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What is the effect of Constitutional AI training on the pass@1 scores of sparse MoE models when evaluated against adversarial prompts in code synthesis tasks. Mixture-of-Experts (MoE) networks promise favorable…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: Do 13B VLA models with multimodal pretraining demonstrate better zero-shot reasoning capabilities on the MM-ReAct benchmark compared to 7B models when evaluated using Exact Match accuracy. Web-crawled pretraining…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: How does RLHF alignment impact the adversarial robustness of sparse MoE models on code generation benchmarks like HumanEval Pro compared to dense architectures. We introduce self-invoking code generation, a new…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: To what extent does adversarial robustness training (e.g., R-LPIPS) improve the cross-domain generalization of Wan2.1 I2V-14B when fine-tuned with LoRA on unseen video synthesis benchmarks beyond. We present a…
Abstract: This report synthesises findings from 7 peer-reviewed papers addressing the following research question: How does the LoRA rank selection (e.g., 4, 8, 16) in Wan2.1 I2V-14B affect its inference efficiency (latency, throughput) on human video synthesis tasks while maintaining comparable FVD and LPIPS. Abstract Deep…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: How does the choice of LoRA rank in Wan2.1 I2V-14B influence its ability to preserve temporal consistency (measured via FVD) versus perceptual quality (measured via R-LPIPS) in long-form human video. We present a…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: How does the scaling of LoRA rank in multimodal diffusion transformers affect memory footprint and generation speed relative to full parameter fine-tuning on downstream video tasks. Large models represent a…
Abstract: This report synthesises findings from 3 peer-reviewed papers addressing the following research question: What is the trade-off between inference latency and video quality metrics (e.g., FVD, CLIP score) when applying low-rank adaptation to the Wan2.1 14B model for edge deployment. The identification of genetically…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What is the comparative performance of joint latent space compression versus specialized video latent models on text-to-video generation accuracy measured by CLIP score and motion consistency metrics. Abstract…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: How does the integration of W.A.L.T's causal encoder design with Flamingo's visual tokenizer impact inference latency and downstream video captioning performance on ActivityNet when compared to. Video description…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What is the quantitative trade-off between NDCG@10 recommendation accuracy and RLHF alignment scores when jointly modeling short-term and long-term user preferences using instruction-tuned LLMs. Abstract The…
Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: To what extent does scaling the number of Indonesian video-text training samples in MSVD-Indonesian affect the zero-shot cross-lingual transfer performance of Flamingo on non-Indonesian video. Multimodal learning…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What metrics (e.g., BLEU, CIDEr, METEOR) demonstrate the robustness of Indonesian video-text models like MSVD-Indonesian when fine-tuned with PaLI versus Flamingo on MSRVTT, and how does this compare. While…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the degradation in out-of-distribution robustness for video encoders pretrained on synthetic datasets when evaluated on diverse human motion benchmarks. Deep convolutional neural networks have performed…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What is the inference efficiency latency trade-off between Spatio-Temporal Graph Convolutional Networks and modern graph diffusion models for real-time traffic forecasting. Long-term traffic prediction is highly…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What is the trade-off between inference latency and code generation accuracy when applying 4-bit quantization to transformer-based models on the MBPP dataset. Abstract The rapid evolution of large language models…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: How do graph diffusion models scale in parameter count and prediction accuracy compared to STGCN when applied to large-scale multimodal traffic datasets. Timely accurate traffic forecast is crucial for urban…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: To what extent does Mul-GAD's semi-supervised approach improve anomaly detection accuracy over fully unsupervised GNN models like OCSVM-GNN on cross-domain datasets such as Amazon and DBLP, using. Machine learning…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: How does the inference latency of GADT3 compare to traditional GCN-based models under varying degrees of adversarial graph structure perturbations, measured using the OGB-LSC traffic prediction. Cyberattacks…
Abstract: This report synthesises findings from 6 peer-reviewed papers addressing the following research question: How does the adversarial robustness of graph diffusion models compare to STGCN under targeted node feature perturbations measured by AUC-ROC on traffic datasets. Traffic forecasting plays a critical role in…
Abstract: This report synthesises findings from 5 peer-reviewed papers addressing the following research question: To what extent can multimodal knowledge distillation from code-text pairs improve the robustness of small language models in code generation tasks, as measured by pass@k and latency metrics on. Recently, ChatGPT,…