Impact of Intermediate-Task Training Language on Zero-Shot Cross-Lingual Transfer in XTREME-R
Abstract
Abstract: Multilingual Pretrained Language Models (MPLMs) perform strongly in cross-lingual transfer. We propose Prompts Augmented by Retrieval Crosslingually (PARC) to improve zero-shot performance on low-resource languages (LRLs) by augmenting the context with prompts consisting of semantically similar sentences retrieved from a high-resource language (HRL). PARC improves zero-shot performance on three downstream tasks (sentiment classification, topic categorization, natural language inference) with multilingual parallel test sets across 10 LRLs covering 6 language families in unlabeled (+5.1\%) and la
Research Question
How does the choice of intermediate-task training language (other than English) in the XTREME-R benchmark affect zero-shot cross-lingual transfer performance on low-resource languages compared to direct fine-tuning?
Verification Level
| Paper level | L2, Source-grounded claims | |
| Source-grounded claims | 5 | |
| Claim record source | not publicly specified |
Descriptive public verification status only; aggregate claim counts are public, but individual claim records are not exposed here.
Truth-Engine Gate Verdict
| Status | Verified | |
| Gate | Gate 2 — Verification (formal proof or sandbox reproduction) | |
| Reason | Sealed-sandbox formula repro: Computed 3.7 matches expected 3.7 (tolerance=5.0%). | |
| Evaluated | 2026-06-20T07:22:44.096775+00:00 |
This record has passed Gate 2: a Lean4 proof source type-checks, or a sealed-sandbox run reproduced the reported results within the stated tolerance. A reproducible artifact (proof source or repro script and results) is attached to this record. VERIFIED requires an attached reproducible artifact (Lean4 proof source, or repro script and results) before this status can be set; it is not derived from review score or claim count.
Quality Tier
| Tier | Flagship candidate | |
| Basis | Review score, verified-claim count, and public artifact coverage meet flagship-candidate thresholds. |
Descriptive public triage only; this tier does not alter current publication or DOI behavior.
Quality Dimensions
| Evidence strength | MEDIUM | |
| Citation grounding | MEDIUM | |
| Uncertainty disclosure | MEDIUM | |
| Reproducibility status | HIGH |
Automated triage signals derived from public fields; not human peer review or independent validation.
Correction Record
| Status | CURRENT |
| Correction count | 0 |
| Manifest contract | paper-manifest-v1.1 |
| Correction contract | correction-record-v1 |
Public corrections are additive records. Current status does not claim the synthesis is error-free.
Provenance
| Publisher | Assignee Research |
| Public provenance | L4, External archival record |
| Report artifact | Available |
| External record | Registered |
| Claim lineage | 5 aggregate source-grounded claims |
| Review method | Automated multi-reviewer assessment |
| Quality guide | How to read scores, claims, manifests, and evidence links |
| Provenance contract | source-provenance-v1 |
| Note | Machine-generated synthesis of existing literature. Not primary research. |