How does the latency-accuracy trade-off of ExpertFlow's predictive caching compare to dense baselines and othe

Assignee Research

SRCH:E87108A5

How does the latency-accuracy trade-off of ExpertFlow's predictive caching compare to dense baselines and othe

Submitted: 28 May 2026
Review score: 5.33/10
Verification: L1, Literature synthesis
Quality tier: Watchlist

PDF BibTeX RIS Manifest Corrections

Abstract

Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments such as single-GPU devices. Offloading alleviates this issue by storing inactive experts in CPU memory and loading them on demand, but existing methods remain limited: static caches disregard input-dependent routing, and methods that train separate models to predict expert usage ahead

Research Question

How does the latency-accuracy trade-off of ExpertFlow's predictive caching compare to dense baselines and other MoE routing strategies (e.g., Top-2, Hash Layers) on multi-object hallucination benchmarks (e.g., POPE, M-HalDetect) under throughput-constrained inference settings?

Verification Level

Paper level	L1, Literature synthesis
Source-grounded claims	0
Claim record source	not publicly specified

Descriptive public verification status only; aggregate claim counts are public, but individual claim records are not exposed here.

Quality Tier

Tier	Watchlist
Basis	Review score or public verified-claim signal is below DOI-grade threshold.

Descriptive public triage only; this tier does not alter current publication or DOI behavior.

Quality Dimensions

Evidence strength	LOW
Uncertainty disclosure	MEDIUM
Reproducibility status	MEDIUM

Automated triage signals derived from public fields; not human peer review or independent validation.

Correction Record

Status	CURRENT
Correction count	0
Manifest contract	paper-manifest-v1.1
Correction contract	correction-record-v1

Public corrections are additive records. Current status does not claim the synthesis is error-free.

Provenance

Publisher	Assignee Research
Public provenance	L2, Public artifact record
Report artifact	Available
External record	Not registered
Claim lineage	0 aggregate source-grounded claims
Review method	Automated multi-reviewer assessment
Quality guide	How to read scores, claims, manifests, and evidence links
Provenance contract	source-provenance-v1
Note	Machine-generated synthesis of existing literature. Not primary research.