Provenance as prerequisite to trustworthy belief
Large language models propagate ideas without lineage: Confucius is said to have written the Dao De Jing; Socrates is treated as author of Plato’s Republic; Freud is credited with cognitive dissonance; nirvana becomes “eternal heaven.” Sophia (σοφία, wisdom) is an open provenance corpus + a verifier gate that enforces source discipline before reasoning — and abstains rather than fabricate. Validated: 0% fabrication on genuine “I don’t know” traps (raw models 17–25%); a 12.5-point reduction in hallucinated attributions on a local model at 0% false-positive cost. Current public claim: an AGI-candidate proof package, not proven AGI.
The lineage-merge failure mode
Epistemic harm in LLM output is not only factual error but attribution collapse: distinct intellectual traditions are merged into a single undifferentiated voice. The failure is structural. When a model answers “Did Confucius write the Dao De Jing?” without denying the trap, it licenses centuries of conflation between 儒家 (Confucian) and 道家 (Daoist) registers — and then builds “reasoning” on top of the error.
Sophia treats each trap as measurable. Validated: the gate fabricates 0% on genuine unknown-answer questions where raw models fabricate 17–25%. On a real local model it cuts hallucinated attributions 36.1% → 23.6% (Δ 12.5%, 95% CI [5.6%, 19.4%]) at 0% false-positive cost. It is a filter that reduces harm — not a guarantee, and not a substitute for human oversight.
| Domain | Exemplar trap | Correct discipline |
|---|---|---|
| Philosophy | Confucius → 《道德經》 | Deny; Laozi a legendary attribution |
| Psychology | Freud → cognitive dissonance | Affirm Festinger (1957) |
| History | Marco Polo → pasta | Label myth; prior Italian evidence |
| Religion | Nirvana → eternal heaven | Council + Buddhist doctrine |
Source discipline as a framework
Source discipline is the project’s core construct. It requires five operations on every answer:
- Named attribution —
attributedAuthor,doNotAttributeToin each data record - Confidence signaling — compiled, legendary, disputed, or consensus
- Boundary maintenance — traditions and subfields must not silently merge
- Bilingual anchoring — canonical 中文 terms + a 中文摘要
- Hub examples — one training pair may cover several benchmark traps (example 001 → four philosophy cases)
The framework generalizes from philosophy to psychology, history, and religion without changing the epistemic contract: retrieval of records precedes generation, and a gate checks discipline markers before release.
Methodology: corpus, benchmark, gate
3.1 Data layer
Structured JSON in data/: attributions.json, psychology_concepts.json,
religion_concepts.json, traditions.json. 528 bilingual examples are
published as corpus.jsonl on Hugging Face.
3.2 Benchmark layer
Per-domain cases in tests/benchmark-{domain}.json, scored by tools/run_benchmark.py against
explicit markers: denial patterns, myth labels, council format, tradition ids, subfield tags. Reference teacher
responses score 100% on all domains; external models run on the same harness. Every headline number must
clear the no-overclaim gate (≥2 judge families, κ ≥ 0.40, ≥3 runs, confidence intervals).
3.3 Gate layer
Every answer passes an epistemic gate that checks source-discipline markers before release. On the same marker-based harness, the local model scores 20/23 (87%) and curated RAG + Claude scores 22/23 (96%). Implementation details of the model and training are kept out of this public thesis; the deliverable here is the measured behaviour, not the build recipe.
UI Council: how this site was decided
Following the religion-council mode, all design voices sit on one panel. No single aesthetic wins by
default; tensions are named. Full record: docs/10-Web/UI-Council-Decisions.md.
web/ on GitHub Pages; tools/serve_web.py adds /api/ask for the live agent. The manifest is regenerated by build_web_data.py so the page can’t drift from main.Empirical results: per-domain leaderboards
Leaderboards are generated from benchmark/results/leaderboard-*.json. A model passes when heuristic
markers match domain-specific discipline rules — the same contract applied to the reference teacher and to external
runs. These are marker-based harness scores; the headline-grade, multi-judge results live in
RESULTS.md.
Head-to-head: where Sophia wins, and where it doesn’t
The leaderboards above are marker-based and saturate near 100% on easy cases. The charts below are the honest comparisons — drawn straight from the curated published results, each carrying its own gate, confidence interval, and caveat. Sophia’s edge is not raw accuracy; it is abstaining instead of fabricating. So these include the cases where the provenance gate loses — published in the same breath as the wins.
Read this honestly
The agent: gated, multi-mode, human-approved
Sophia runs as an agent with several decision modes, each constrained by the same epistemic gate: a claim is verified against curated sources before it is accepted, and actions that change state require explicit human approval. The point this thesis defends is behavioural — grounded answers, abstention over fabrication, and a fail-closed posture. The internal architecture and operational wiring are intentionally kept out of the public site.
Curated retrieval, no open-web grounding
Retrieval is restricted to a curated index (benchmark holdouts excluded) over the project's data, disputes, domain docs, reference answers, and examples — no open-web grounding. Whatever backend generates the answer, it passes the same epistemic gate before release. This keeps the system's claims tied to vetted sources rather than the live internet.
AGI-candidate proof package
Sophia does not claim proven AGI. It publishes a stricter, auditable proof package: pre-registered thresholds, reproducible local benchmarks under a no-overclaim gate (multi-judge + CIs), a self-extending verifier flywheel that closes on held-out data, hidden-reviewer packs, long-horizon logs, a failure ledger, and a third-party replication checklist.
Proof ladder
Required data before stronger AGI claims
External benchmark status
Proof package Failure ledger Web evidence manifest
Moral + epistemic Conscience Kernel
Sophia includes a deterministic, fail-closed candidate Conscience Kernel that governs AI output, tool
calls, and trusted-memory writes. It returns one of seven verdicts — allow, revise,
retrieve, clarify, escalate, abstain, or block —
combining fact-checking, constitutional limits, and moral-uncertainty handling. The published view is this
decision interface and its boundary; the internal composition lives in the repository, not on this site.
canClaimAGI = false.
Ask Sophia (live council)
Query the agent through this panel when tools/serve_web.py is running. Otherwise the equivalent CLI command is shown for you to copy.
Project, corpus, benchmark, and growth decisions.
Repository & citation
github.com/tomyimkc/sophia-agi Hugging Face dataset RESULTS.md
@misc{sophia2026,
title = {Sophia — the Wisdom Gate: Provenance-Aware Reasoning Corpus},
author = {tomyimkc and Sophia contributors},
year = {2026},
url = {https://github.com/tomyimkc/sophia-agi}
}