Methodology

Effective 31 May 2026. Public methodology for Assignee Research.

Purpose

Assignee Research publishes machine-assisted literature syntheses, benchmark intelligence, and selected machine-verifiable research artifacts. The public record is designed to be inspectable: each output should make clear what was claimed, where the claim came from, how it was assessed, and what limitations remain.

The literature synthesis pipeline does not run new experiments or create new datasets. Its reports summarize and compare existing published work. Mathematical reports are presented separately because their status depends on computational search or formal verification rather than literature synthesis.

Literature Synthesis Workflow

1	Question selection. A specific research question is selected from the active research agenda. Broad or redundant questions are de-prioritized.
2	Source retrieval. The system searches public scientific indexes and collects candidate papers relevant to the question.
3	Reading and extraction. When full text is available, the system uses sections and tables in addition to abstracts. Candidate claims and benchmark scores are extracted from source material.
4	Grounding check. Claims are compared against retrieved sources. Claims that cannot be grounded are excluded from publication or treated as lower-confidence material.
5	Quality assessment. Drafts are evaluated for source coverage, consistency, evidence strength, uncertainty disclosure, and publication suitability.
6	Artifact generation. Approved reports are rendered as public pages, citation exports, and when available, PDF artifacts and external records.

Quality Assessment

Quality scores are automated assessments of source grounding, internal consistency, and evidence coverage. They are not human peer review and should not be read as endorsement by domain experts. A high score indicates that a report passed the system's internal checks more strongly than a lower-scored report.

The score is useful for triage, but the source papers remain authoritative. Readers should cite original sources for specific factual claims and cite Assignee Research only when referring to the synthesis, comparison, or benchmark audit itself.

Benchmark Discrepancy Detection

Benchmark pages aggregate model-performance claims extracted from papers. A discrepancy is flagged when different sources report divergent scores for the same model and benchmark. A spread of at least three percentage points is treated as notable; larger spreads are given higher severity.

Discrepancies are not automatically accusations of error. They may arise from different prompts, dataset versions, evaluation protocols, scoring rules, preprocessing, fine-tuning, or reporting conventions. The purpose of the tracker is to make ambiguity visible and auditable.

Mathematical Results

Mathematical outputs are separated from literature syntheses. A formal proof is treated differently from computational evidence. Computational evidence means that a search found no counterexample within the stated conditions; it is not a proof. Formal claims require machine-checkable proof artifacts before they are presented as proven.

Gate 2 Verification (Truth-Engine)

As of 2026-06-10, every published record carries a gate_verdict (see gate-verdict-v1 in Schemas) describing whether it has passed Gate 2 of the verification pipeline:

UNVERIFIED	The record has not completed Gate 2. This is the default for a literature synthesis with no attached formal proof or sandbox reproduction. All papers published before 2026-06-10 are marked `UNVERIFIED`.
VERIFIED	The record has passed Gate 2: a Lean4 proof source type-checks, or a sealed-sandbox run reproduced the reported results within stated tolerance. A reproducible artifact (proof source, or repro script and results) is attached.
FALSIFIED	A claim was tested against Gate 2 and failed: a counterexample was found, a proof did not type-check, or a reproduction did not match the reported results. The failing evidence is attached to the record.

VERIFIED status is never derived from review score or claim count; only from an attached, independently checkable artifact. The homepage "Verified contributions" count reflects only records with VERIFIED status; it is intentionally distinct from, and typically much smaller than, the total paper count.

Limitations

L-1	Automated inference can be wrong. Retrieval, extraction, scoring, and summarization can miss sources, misread tables, or overstate agreement.
L-2	Coverage is incomplete. The system can only reason over material it retrieves and parses successfully.
L-3	Benchmark comparability is fragile. Scores with the same benchmark name may still differ because the underlying protocol differs.
L-4	Public pages are summaries. The source papers and cited artifacts should be consulted before relying on any scientific conclusion.

Corrections

Correction requests should include the affected URL, the exact claim or score, the reason it appears wrong, and a source that supports the correction. Send requests to contact@assignee.net. Substantive corrections may result in updated public text, hidden or withdrawn entries, changed discrepancy status, or a note in the methodology changelog.

Methodology Changelog

2026-06-10	Activated Gate 2 verification pipeline (`gate-verdict-v1`). All 7,372 papers published before this date were marked `UNVERIFIED` pending formal proof or sandbox reproduction. See Changelog C-30.
2026-05-31	Published public methodology page, correction policy, route verification, and public leak checks for the web surface.