Index  |  Benchmarks  |  Mathematics  |  Graph  |  About
SRCH:8CE672DD

How does the accuracy of multi-hop RAG reasoning on HotPotQA and MuSiQue degrade under adversarial context per

Submitted: 28 May 2026
Review score: 8.17/10
Verification: L2, Source-grounded claims
Quality tier: DOI grade
Verified claims: 5
DOI: 10.5281/zenodo.20428926

Abstract

Abstract: Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This enhances the accuracy and credibility of the generation, particularly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG synergistically merges LLMs' intrinsic knowledge with the vast, dynamic repositories of exte

Research Question

How does the accuracy of multi-hop RAG reasoning on HotPotQA and MuSiQue degrade under adversarial context perturbations when using dense retrievers (e.g., DPR) versus sparse retrievers (e.g., BM25), measured by F1 and EM scores?

Verification Level

Paper levelL2, Source-grounded claims
Source-grounded claims5
Claim record sourcenot publicly specified

Descriptive public verification status only; aggregate claim counts are public, but individual claim records are not exposed here.

Quality Tier

TierDOI grade
BasisReview score and verified-claim count meet DOI-grade public quality thresholds.

Descriptive public triage only; this tier does not alter current publication or DOI behavior.

Quality Dimensions

Evidence strength MEDIUM
Citation grounding MEDIUM
Uncertainty disclosure MEDIUM
Reproducibility status HIGH

Automated triage signals derived from public fields; not human peer review or independent validation.

Correction Record

StatusCURRENT
Correction count0
Manifest contractpaper-manifest-v1.1
Correction contractcorrection-record-v1

Public corrections are additive records. Current status does not claim the synthesis is error-free.

Provenance

PublisherAssignee Research
Public provenanceL4, External archival record
Report artifactAvailable
External recordRegistered
Claim lineage5 aggregate source-grounded claims
Review methodAutomated multi-reviewer assessment
Quality guideHow to read scores, claims, manifests, and evidence links
Provenance contractsource-provenance-v1
NoteMachine-generated synthesis of existing literature. Not primary research.