Papers
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of LongLLaVA-9B on reasoning mathematics coding and language understanding tasks. 13 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of LongVA-7B on reasoning mathematics coding and language understanding tasks. 17 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Video-LLaVA-8B on reasoning mathematics coding and language understanding tasks. 12 claims were extracted from source literature; 1 was independently verified against…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Llama-Guard-3-1B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Mantis-Idefics2-8B on reasoning mathematics coding and language understanding tasks. 12 claims were extracted from source literature; 1 was independently verified…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Foundation-Sec-8B-Reasoning on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Foundation-Sec-8B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of gemma-2-9B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of gemma-2-2B on reasoning mathematics coding and language understanding tasks. 11 claims were extracted from source literature; 1 was independently verified against…
Abstract: This report synthesises findings from 1 peer-reviewed paper addressing the following research question: What are the benchmark performance scores of Claude-Opus-4 on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Mistral-7B-Instruct-v0.3 on reasoning mathematics coding and language understanding tasks. 11 claims were extracted from source literature; 1 was independently…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Claude-3.7-Sonnet on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Codestral-22B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of GPT-5.5 on reasoning mathematics coding and language understanding tasks. 15 claims were extracted from source literature; 1 was independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Phi-4 on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against retrieved…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of CodeGemma-7B on reasoning mathematics coding and language understanding tasks. 12 claims were extracted from source literature; 1 was independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Llama-0.72 on reasoning mathematics coding and language understanding tasks. 8 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Gemma-7B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of DeepSeek-V2 on reasoning mathematics coding and language understanding tasks. 10 claims were extracted from source literature; 9 were independently verified against…
Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of DeepSeek-Coder on reasoning mathematics coding and language understanding tasks. 11 claims were extracted from source literature; 9 were independently verified against…
Abstract: This report synthesises findings from 9 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of CodeQwen1.5 on reasoning mathematics coding and language understanding tasks. 9 claims were extracted from source literature; 8 were independently verified against…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of codegen-2b on reasoning mathematics coding and language understanding tasks. 8 claims were extracted from source literature; 8 were independently verified against…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of StarCoder-2 on reasoning mathematics coding and language understanding tasks. 9 claims were extracted from source literature; 9 were independently verified against…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Gemini-2 on reasoning mathematics coding and language understanding tasks. 10 claims were extracted from source literature; 10 were independently verified against…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of codellama-7b on reasoning mathematics coding and language understanding tasks. 16 claims were extracted from source literature; 14 were independently verified against…