How does the cross-modal alignment quality (measured by zero-shot accuracy on VQA and TextVQA) of Uni-MoE-2.0-

Assignee Research

SRCH:5EFC78FA

How does the cross-modal alignment quality (measured by zero-shot accuracy on VQA and TextVQA) of Uni-MoE-2.0-

Submitted: 28 May 2026
Review score: 0.00/10
Verification: L1, Literature synthesis
Quality tier: Quarantine candidate

PDF BibTeX RIS Manifest Corrections

Abstract

Abstract: We present Uni-MoE 2.0 from the Lychee family. As a fully open-source omnimodal large model (OLM), it substantially advances Lychee's Uni-MoE series in language-centric multimodal understanding, reasoning, and generating. Based on the dense LLM, we build Uni-MoE-2.0-Omni from scratch through three core contributions: dynamic-capacity Mixture-of-Experts (MoE) design, a progressive training strategy enhanced with an iterative reinforcement strategy, and a carefully curated multimodal data matching technique. It is capable of omnimodal understanding, as well as generating images, text, and speech

Research Question

How does the cross-modal alignment quality (measured by zero-shot accuracy on VQA and TextVQA) of Uni-MoE-2.0-Omni degrade when the input modality composition shifts from single-image to multi-image and video frames, relative to a dense multimodal LLM baseline of equivalent total parameter count?

Verification Level

Paper level	L1, Literature synthesis
Source-grounded claims	0
Claim record source	not publicly specified

Descriptive public verification status only; aggregate claim counts are public, but individual claim records are not exposed here.

Quality Tier

Tier	Quarantine candidate
Basis	Review score is below 5.0; source-level inspection is required before relying on the synthesis.

Descriptive public triage only; this tier does not alter current publication or DOI behavior.

Quality Dimensions

Uncertainty disclosure	MEDIUM
Reproducibility status	MEDIUM

Automated triage signals derived from public fields; not human peer review or independent validation.

Correction Record

Status	CURRENT
Correction count	0
Manifest contract	paper-manifest-v1.1
Correction contract	correction-record-v1

Public corrections are additive records. Current status does not claim the synthesis is error-free.

Provenance

Publisher	Assignee Research
Public provenance	L2, Public artifact record
Report artifact	Available
External record	Not registered
Claim lineage	0 aggregate source-grounded claims
Review method	Automated multi-reviewer assessment
Quality guide	How to read scores, claims, manifests, and evidence links
Provenance contract	source-provenance-v1
Note	Machine-generated synthesis of existing literature. Not primary research.