What is the trade-off between pre-training computational cost (measured in FLOPs) and downstream fairness (mea
Abstract
Abstract: Foundation models for tabular data, such as the Tabular Prior-data Fitted Network (TabPFN), are pre-trained on a massive number of synthetic datasets generated by structural causal models (SCM). They leverage in-context learning to offer high predictive accuracy in real-world tasks. However, the fairness properties of these foundational models, which incorporate ideas from causal reasoning during pre-training, remain underexplored. In this work, we conduct a comprehensive empirical evaluation of TabPFN and its fine-tuned variants, assessing predictive performance, fairness, and robustness acro
Research Question
What is the trade-off between pre-training computational cost (measured in FLOPs) and downstream fairness (measured via demographic parity) when scaling the number of SCM features in TabPFN?
Verification Level
| Paper level | L2, Source-grounded claims | |
| Source-grounded claims | 17 | |
| Claim record source | parsed source sections |
Descriptive public verification status only; aggregate claim counts are public, but individual claim records are not exposed here.
Truth-Engine Gate Verdict
| Status | Unverified | |
| Gate | Gate 2 — Verification (formal proof or sandbox reproduction) | |
| Reason | Published before the Gate 2 verification pipeline was activated (2026-06-10). No formal proof or sandbox reproduction has been attempted for this record. | |
| Evaluated | 2026-06-10T06:30:49+00:00 |
This record has not completed Gate 2 of the verification pipeline (a type-checked Lean4 proof for mathematical claims, or a sealed-sandbox reproduction for empirical claims). It is a literature synthesis only. VERIFIED requires an attached reproducible artifact (Lean4 proof source, or repro script and results) before this status can be set; it is not derived from review score or claim count.
Quality Tier
| Tier | Quarantine candidate | |
| Basis | Review score is below 5.0; source-level inspection is required before relying on the synthesis. |
Descriptive public triage only; this tier does not alter current publication or DOI behavior.
Quality Dimensions
| Evidence strength | LOW | |
| Citation grounding | MEDIUM | |
| Uncertainty disclosure | MEDIUM | |
| Reproducibility status | MEDIUM |
Automated triage signals derived from public fields; not human peer review or independent validation.
Correction Record
| Status | CURRENT |
| Correction count | 0 |
| Manifest contract | paper-manifest-v1.1 |
| Correction contract | correction-record-v1 |
Public corrections are additive records. Current status does not claim the synthesis is error-free.
Provenance
| Publisher | Assignee Research |
| Public provenance | L3, Claim aggregate record |
| Report artifact | Available |
| External record | Not registered |
| Claim lineage | 17 aggregate source-grounded claims |
| Review method | Automated multi-reviewer assessment |
| Quality guide | How to read scores, claims, manifests, and evidence links |
| Provenance contract | source-provenance-v1 |
| Note | Machine-generated synthesis of existing literature. Not primary research. |