Papers
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of GPT-2-120M on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of GPT-2-340M on reasoning mathematics coding and language understanding tasks. 10 claims were extracted from source literature; 1 was independently verified against…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of LLaVA-7B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Llama-7B on reasoning mathematics coding and language understanding tasks. 13 claims were extracted from source literature; 2 were independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of TinyLlama-1.1B on reasoning mathematics coding and language understanding tasks. 12 claims were extracted from source literature; 2 were independently verified against…
Abstract: This report synthesises findings from 13 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of CodeRM-8B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of PaliGemma-3B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of MedGemma on reasoning mathematics coding and language understanding tasks. 10 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of R1-7B on reasoning mathematics coding and language understanding tasks. 11 claims were extracted from source literature; 1 was independently verified against retrieved…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of R1-1.5B on reasoning mathematics coding and language understanding tasks. 12 claims were extracted from source literature; 6 were independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of R1-14B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Claude-3.5-Sonnet on reasoning mathematics coding and language understanding tasks. 11 claims were extracted from source literature; 2 were independently verified…
Abstract: This report synthesises findings from 14 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of R1-32B on reasoning mathematics coding and language understanding tasks. 13 claims were extracted from source literature; 2 were independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Decoder-Only-1B-KD on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of ReflexiCoder-8B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 12 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of GPT-5.1 on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of LeDex-RL-13B on reasoning mathematics coding and language understanding tasks. 14 claims were extracted from source literature; 2 were independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of AdaptToken-8B on reasoning mathematics coding and language understanding tasks. 15 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of AdaptToken-Lite-8B on reasoning mathematics coding and language understanding tasks. 11 claims were extracted from source literature; 0 were independently verified…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of AdaptToken-Lite-7B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of InternVL3.5-8B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of Llama-4-17B-16E on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 15 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of InternVL3.5-38B on reasoning mathematics coding and language understanding tasks. 0 claims were extracted from source literature; 0 were independently verified against…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of InternVL3.5-30B-A3B on reasoning mathematics coding and language understanding tasks. 12 claims were extracted from source literature; 6 were independently verified…
Abstract: This report synthesises findings from 16 peer-reviewed papers addressing the following research question: What are the benchmark performance scores of InternVL3.5-241B-A28B on reasoning mathematics coding and language understanding tasks. 11 claims were extracted from source literature; 0 were independently verified…