Quality Guide

How to read review scores, claim counts, artifact manifests, and benchmark evidence pages.

Public surface changes are tracked in the changelog.

Purpose

Assignee Research publishes machine-assisted literature syntheses and benchmark evidence. Quality signals are designed for triage and auditability. They should help readers decide what to inspect next, not replace the cited source papers.

Review Score

The review score is an automated assessment on a 0-10 scale. It reflects source grounding, internal consistency, coverage, uncertainty disclosure, and suitability for publication. It is not human peer review and is not an endorsement by domain experts.

Quality Dimensions

Quality assessment is moving toward multidimensional public signals. The dimension schema is documented at Quality dimensions v1. Dimensions are audit aids, not independent validation.

D-1	Evidence strength. How directly and sufficiently cited sources support the main claims.
D-2	Citation grounding. Whether claims are tied to inspectable source material and citations.
D-3	Numerical consistency. Whether numbers, units, ranges, and comparisons are internally consistent.
D-4	Benchmark validity. Whether benchmark references preserve task, dataset, metric, and protocol context.
D-5	Novelty. Whether the work adds a distinct synthesis, comparison, or finding beyond restating sources.
D-6	Uncertainty disclosure. Whether limitations, ambiguity, and non-validated claims are explicitly marked.
D-7	Reproducibility status. Whether public artifacts and cited sources support independent inspection.
D-8	Safety/impact. Whether potential misuse, deployment, or societal impact is identified when relevant.

Quality Tiers

Quality tiers map public aggregate signals into compact inspection bands. The tier schema is documented at Quality tiers v1. Tiers are descriptive triage labels; they do not replace source-level inspection or independent validation.

T-1	FLAGSHIP_CANDIDATE. Strong public candidate for deeper editorial attention and flagship reporting.
T-2	DOI_GRADE. Meets the current public quality band used for DOI-grade literature syntheses.
T-3	PUBLIC_RECORD. Publicly listed literature synthesis with sufficient metadata for inspection.
T-4	WATCHLIST. Public record that should be inspected with extra caution before relying on the synthesis.
T-5	QUARANTINE_CANDIDATE. Low-scoring public record that should not be used without source-level inspection.

Review Monitor

The review monitor records public re-review triggers for each manifest. The monitor schema is documented at Review monitor v1. Monitor state is descriptive public triage; it does not alter current publication or DOI behavior.

R-1	NEW_CONTRADICTORY_BENCHMARK_SCORE. A newly observed benchmark score creates or increases a material cross-source discrepancy.
R-2	CITATION_SOURCE_UNAVAILABLE. A cited public source becomes unavailable or no longer supports inspection.
R-3	EXTRACTION_SIGNAL_CHANGED. Claim extraction, benchmark extraction, or grounding signals materially change.
R-4	CORRECTED_MANIFEST. A public manifest or artifact record is corrected after publication.
R-5	LOW_QUALITY_TIER. The public quality tier marks the record as WATCHLIST or QUARANTINE_CANDIDATE.

Verification Ladder

Verification levels make the public evidence status explicit. The ladder schema is documented at Verification levels v1. Current paper-level status is descriptive; it does not imply sandbox execution, independent reproduction, external review, or formal verification.

L0	Generated text, not public. Generated material that is not exposed as a public research record.
L1	Literature synthesis. A public synthesis based on retrieved scientific source material.
L2	Source-grounded claims. At least one extracted claim has supporting context from retrieved source material.
L3	Cross-source corroborated claims. A claim is corroborated across multiple inspectable public sources.
L4	Artifact-inspected claims. A claim is checked against public artifacts beyond citation text.
L5	Sandbox executed. Supporting code or computation has been executed in a controlled environment.
L6	Independently reproduced. A result has been independently reproduced from supporting artifacts.
L7	External or formal verification. A result has external review evidence or machine-checkable formal verification.

Current manifests expose aggregate claim verification counts. Individual claim records are not public unless a future manifest contract attaches sanitized per-claim evidence.

Verified Claims

The verified claim count is the number of extracted claims that passed automated grounding checks against retrieved source material. This means the system found supporting context for the claim. It does not mean the claim has been independently reproduced, experimentally validated, or formally proven.

Artifact Manifest

Each paper page links to a public manifest. The manifest lists public artifacts for the work: the abstract page, citation exports, the manifest itself, a PDF when available, and an external record when available. It intentionally omits local file paths, private infrastructure details, and non-public operational metadata. The public schema is documented at Schemas.

Benchmark Evidence

Benchmark evidence pages group score claims from papers by model and benchmark label. A spread shows the range of reported values in the current corpus. A large spread is a signal to inspect the source papers and evaluation details; it is not automatically an accusation that any paper is wrong. The machine-readable benchmark evidence schema is also documented at Schemas.

Source Authority

For scientific facts, source papers remain authoritative. Cite Assignee Research when referring to the synthesis, comparison, artifact manifest, or benchmark audit. Cite original papers for the underlying scientific claims.

Limitations

Q-1	Automated extraction can fail. Tables, units, benchmark variants, and paper-specific notation can be misread.
Q-2	Coverage is incomplete. The system can only evaluate material that it retrieves and parses successfully.
Q-3	Scores are contextual. The same benchmark name can hide different prompts, datasets, evaluation harnesses, or reporting conventions.
Q-4	Public artifacts are summaries. They improve inspection but do not replace careful reading of source papers.