Papers
Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Llama-3 on reasoning mathematics coding and language understanding tasks. 11 claims were extracted from source literature; 1 was independently verified against…
Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Claude-Sonet-3.5 on reasoning mathematics coding and language understanding tasks. 9 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Grok-Vision-Beta on reasoning mathematics coding and language understanding tasks. 7 claims were extracted from source literature; 1 was independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of VideoLLaMA-1.7B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Phi-3 on reasoning mathematics coding and language understanding tasks. 11 claims were extracted from source literature; 1 was independently verified against retrieved…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Llama-3.1-8B on reasoning mathematics coding and language understanding tasks. 16 claims were extracted from source literature; 2 were independently verified against…
Abstract: This report synthesises findings from 19 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Qwen2.5 on reasoning mathematics coding and language understanding tasks. 13 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Gemma-2 on reasoning mathematics coding and language understanding tasks. 10 claims were extracted from source literature; 4 were independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Gemini-1.5-Pro on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of GPT-4o on reasoning mathematics coding and language understanding tasks. 10 claims were extracted from source literature; 3 were independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of GPT-4T on reasoning mathematics coding and language understanding tasks. 15 claims were extracted from source literature; 1 was independently verified against…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of DeepSeek-V3 on reasoning mathematics coding and language understanding tasks. 12 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of GPT-OSS-120B on reasoning mathematics coding and language understanding tasks. 13 claims were extracted from source literature; 2 were independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Gemini-2.5-Pro on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Gemini-2.0 on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of WizardCoder on reasoning mathematics coding and language understanding tasks. 11 claims were extracted from source literature; 1 was independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of GPT-3.5 on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Qwen2 on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against retrieved…
Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Llama-3.1-70B on reasoning mathematics coding and language understanding tasks. 10 claims were extracted from source literature; 1 was independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of LongVU-7B on reasoning mathematics coding and language understanding tasks. 12 claims were extracted from source literature; 1 was independently verified against…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Claude-3.5-Haiku on reasoning mathematics coding and language understanding tasks. 9 claims were extracted from source literature; 5 were independently verified…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of InternVL3-8B on reasoning mathematics coding and language understanding tasks. 11 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Gemini-1.5-Flash on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Gemini-2.5-Flash on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Video-XL-7B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…