Papers
Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of StarCoderBase-7B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of StarCoderBase-3B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Prompt-Guard-86M on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of T5-11B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Gemma-2-7B on reasoning mathematics coding and language understanding tasks. 12 claims were extracted from source literature; 7 were independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of s1-32B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Gemini-1.5 on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Gemini3-Pro-Preview on reasoning mathematics coding and language understanding tasks. 10 claims were extracted from source literature; 10 were independently verified…
Abstract: This report synthesises findings from 11 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of GPT-5.2-Thinking on reasoning mathematics coding and language understanding tasks. 5 claims were extracted from source literature; 5 were independently verified…
Abstract: This report synthesises findings from 10 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of GLM-4-9B on reasoning mathematics coding and language understanding tasks. 13 claims were extracted from source literature; 4 were independently verified against…
Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Mixtral on reasoning mathematics coding and language understanding tasks. 9 claims were extracted from source literature; 9 were independently verified against…
Abstract: This report synthesises findings from 8 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Grok-4.1 on reasoning mathematics coding and language understanding tasks. 7 claims were extracted from source literature; 6 were independently verified against…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of GLM-4-32B on reasoning mathematics coding and language understanding tasks. 13 claims were extracted from source literature; 3 were independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Llama-4 on reasoning mathematics coding and language understanding tasks. 11 claims were extracted from source literature; 1 was independently verified against…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Gemini-3.1-Pro on reasoning mathematics coding and language understanding tasks. 11 claims were extracted from source literature; 1 was independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Llama-70B on reasoning mathematics coding and language understanding tasks. 11 claims were extracted from source literature; 1 was independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of InternLM2.5-7B on reasoning mathematics coding and language understanding tasks. 18 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Qwen-32B on reasoning mathematics coding and language understanding tasks. 11 claims were extracted from source literature; 1 was independently verified against…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of LlamaGen on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of BaseRL-3B on reasoning mathematics coding and language understanding tasks. 14 claims were extracted from source literature; 1 was independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of DistilGPT2 on reasoning mathematics coding and language understanding tasks. 12 claims were extracted from source literature; 2 were independently verified against…
Abstract: This report synthesises findings from 4 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of CodeGen-2.7B on reasoning mathematics coding and language understanding tasks. 18 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of QWen on reasoning mathematics coding and language understanding tasks. 14 claims were extracted from source literature; 2 were independently verified against retrieved…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of SwS-3B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of SmolLM-3B on reasoning mathematics coding and language understanding tasks. 16 claims were extracted from source literature; 2 were independently verified against…