SRCH:8B893679
ExpertFlow: Efficient Mixture-of-Experts Inference via Predictive Expert Caching
Abstract
Abstract: Sparse Mixture-of-Experts (MoE) models can outperform dense large language models at similar computation by activating only a small set of experts per token. However, stacking many expert modules introduces substantial parameter memory, which makes MoE models difficult to deploy in memory-constrained environments such as single-GPU devices. Offloading alleviates this issue by storing inactive experts in CPU memory and loading them on demand, but existing methods remain limited: static caches disregard input-dependent routing, and methods that train separate models to predict expert usage ahead
Research Question
Does ExpertFlow's offloading and caching mechanism maintain inference throughput gains without degrading object-level hallucination metrics (e.g., POPE) across different MoE-VLM architectures when compared to static cache baselines?
Verification Level
| Paper level | L2, Source-grounded claims | |
| Source-grounded claims | 9 | |
| Claim record source | not publicly specified |
Descriptive public verification status only; aggregate claim counts are public, but individual claim records are not exposed here.
Quality Tier
| Tier | DOI grade | |
| Basis | Review score and verified-claim count meet DOI-grade public quality thresholds. |
Descriptive public triage only; this tier does not alter current publication or DOI behavior.
Quality Dimensions
| Evidence strength | MEDIUM | |
| Citation grounding | MEDIUM | |
| Uncertainty disclosure | MEDIUM | |
| Reproducibility status | HIGH |
Automated triage signals derived from public fields; not human peer review or independent validation.
Correction Record
| Status | CURRENT |
| Correction count | 0 |
| Manifest contract | paper-manifest-v1.1 |
| Correction contract | correction-record-v1 |
Public corrections are additive records. Current status does not claim the synthesis is error-free.
Provenance
| Publisher | Assignee Research |
| Public provenance | L4, External archival record |
| Report artifact | Available |
| External record | Registered |
| Claim lineage | 9 aggregate source-grounded claims |
| Review method | Automated multi-reviewer assessment |
| Quality guide | How to read scores, claims, manifests, and evidence links |
| Provenance contract | source-provenance-v1 |
| Note | Machine-generated synthesis of existing literature. Not primary research. |