SRCH:47E175A8
What is the impact of test-time compute allocation on the inference efficiency and task completion accuracy of
Abstract
Abstract: Recent advances in test-time scaling of large language models (LLMs), exemplified by DeepSeek-R1 and OpenAI's o1, show that extending the chain of thought during inference can significantly improve general reasoning performance. However, the impact of this paradigm on legal reasoning remains insufficiently explored. To address this gap, we present the first systematic evaluation of 12 LLMs, including both reasoning-focused and general-purpose models, across 17 Chinese and English legal tasks spanning statutory and case-law traditions. In addition, we curate a bilingual chain-of-thought dataset
Research Question
What is the impact of test-time compute allocation on the inference efficiency and task completion accuracy of DeepSeek-R1 versus o1-preview models in multilingual legal document classification tasks
Verification Level
| Paper level | L2, Source-grounded claims | |
| Source-grounded claims | 5 | |
| Claim record source | parsed source sections |
Descriptive public verification status only; aggregate claim counts are public, but individual claim records are not exposed here.
Quality Tier
| Tier | Watchlist | |
| Basis | Review score or public verified-claim signal is below DOI-grade threshold. |
Descriptive public triage only; this tier does not alter current publication or DOI behavior.
Quality Dimensions
| Evidence strength | LOW | |
| Citation grounding | MEDIUM | |
| Uncertainty disclosure | MEDIUM | |
| Reproducibility status | MEDIUM |
Automated triage signals derived from public fields; not human peer review or independent validation.
Correction Record
| Status | CURRENT |
| Correction count | 0 |
| Manifest contract | paper-manifest-v1.1 |
| Correction contract | correction-record-v1 |
Public corrections are additive records. Current status does not claim the synthesis is error-free.
Provenance
| Publisher | Assignee Research |
| Public provenance | L3, Claim aggregate record |
| Report artifact | Available |
| External record | Not registered |
| Claim lineage | 5 aggregate source-grounded claims |
| Review method | Automated multi-reviewer assessment |
| Quality guide | How to read scores, claims, manifests, and evidence links |
| Provenance contract | source-provenance-v1 |
| Note | Machine-generated synthesis of existing literature. Not primary research. |