Sophia — the Wisdom Gate · Provenance-aware reasoning that abstains instead of fabricating

Abstract

Provenance as prerequisite to trustworthy belief

Large language models propagate ideas without lineage: Confucius is said to have written the Dao De Jing; Socrates is treated as author of Plato’s Republic; Freud is credited with cognitive dissonance; nirvana becomes “eternal heaven.” Sophia (σοφία, wisdom) is an open provenance corpus + a verifier gate that enforces source discipline before reasoning — and abstains rather than fabricate. Validated: 0% fabrication on genuine “I don’t know” traps (raw models 17–25%); a 12.5-point reduction in hallucinated attributions on a local model at 0% false-positive cost. Current public claim: an AGI-candidate proof package, not proven AGI.

中文摘要：大型語言模型常混淆思想系譜。Sophia 是開源的「來源紀律」語料庫 + 驗證閘道：寧可棄答，也不虛構。陷阱問題 0% 虛構（原始模型 17–25%）；本地模型幻覺歸因降低 12.5 個百分點（0% 假陽性）。目前公開主張：AGI 候選證明包，而非已證明 AGI。

Chapter I

The lineage-merge failure mode

Epistemic harm in LLM output is not only factual error but attribution collapse: distinct intellectual traditions are merged into a single undifferentiated voice. The failure is structural. When a model answers “Did Confucius write the Dao De Jing?” without denying the trap, it licenses centuries of conflation between 儒家 (Confucian) and 道家 (Daoist) registers — and then builds “reasoning” on top of the error.

Sophia treats each trap as measurable. Validated: the gate fabricates 0% on genuine unknown-answer questions where raw models fabricate 17–25%. On a real local model it cuts hallucinated attributions 36.1% → 23.6% (Δ 12.5%, 95% CI [5.6%, 19.4%]) at 0% false-positive cost. It is a filter that reduces harm — not a guarantee, and not a substitute for human oversight.

Domain	Exemplar trap	Correct discipline
Philosophy	Confucius → 《道德經》	Deny; Laozi a legendary attribution
Psychology	Freud → cognitive dissonance	Affirm Festinger (1957)
History	Marco Polo → pasta	Label myth; prior Italian evidence
Religion	Nirvana → eternal heaven	Council + Buddhist doctrine

Chapter II

Source discipline as a framework

Source discipline is the project’s core construct. It requires five operations on every answer:

Named attribution — attributedAuthor, doNotAttributeTo in each data record
Confidence signaling — compiled, legendary, disputed, or consensus
Boundary maintenance — traditions and subfields must not silently merge
Bilingual anchoring — canonical 中文 terms + a 中文摘要
Hub examples — one training pair may cover several benchmark traps (example 001 → four philosophy cases)

The framework generalizes from philosophy to psychology, history, and religion without changing the epistemic contract: retrieval of records precedes generation, and a gate checks discipline markers before release.

Chapter III

Methodology: corpus, benchmark, gate

3.1 Data layer

Structured JSON in data/: attributions.json, psychology_concepts.json, religion_concepts.json, traditions.json. 528 bilingual examples are published as corpus.jsonl on Hugging Face.

3.2 Benchmark layer

Per-domain cases in tests/benchmark-{domain}.json, scored by tools/run_benchmark.py against explicit markers: denial patterns, myth labels, council format, tradition ids, subfield tags. Reference teacher responses score 100% on all domains; external models run on the same harness. Every headline number must clear the no-overclaim gate (≥2 judge families, κ ≥ 0.40, ≥3 runs, confidence intervals).

3.3 Gate layer

Every answer passes an epistemic gate that checks source-discipline markers before release. On the same marker-based harness, the local model scores 20/23 (87%) and curated RAG + Claude scores 22/23 (96%). Implementation details of the model and training are kept out of this public thesis; the deliverable here is the measured behaviour, not the build recipe.

Chapter IV

UI Council: how this site was decided

Following the religion-council mode, all design voices sit on one panel. No single aesthetic wins by default; tensions are named. Full record: docs/10-Web/UI-Council-Decisions.md.

Council panel (all seated): UX Research · Design Systems · Accessibility · Engineering · Philosophy lineage

UX Research

Thesis-first information architecture: Abstract through References, not a marketing funnel. Persistent table of contents; scannable chapters.

Design Systems

Scholarly-monograph aesthetic: ivory paper, ink type, bronze accent, and a three-state verdict palette (accept / abstain / block) that mirrors the gate itself. Serif body, sans chrome.

Accessibility

17–18px base, generous leading, skip link, visible focus, text labels on every score, light/dark by system preference, CJK-friendly font stack.

Engineering

Static web/ on GitHub Pages; tools/serve_web.py adds /api/ask for the live agent. The manifest is regenerated by build_web_data.py so the page can’t drift from main.

Philosophy lineage

The site must itself practice source discipline: cite paths, show benchmark evidence, and carry the scope disclaimer — wisdom before intelligence, without hype.

Debate / tension: thesis depth vs. mobile brevity → progressive disclosure: full chapters and a sticky contents rail on desktop; a collapsed nav on small screens; the agent panel is optional.

中文：本網站設計經理事會五席表決——論文式章節、學術視覺、無障礙、靜態部署加可選代理 API；體現「智慧優先於智能」。

Chapter V

Empirical results: per-domain leaderboards

Leaderboards are generated from benchmark/results/leaderboard-*.json. A model passes when heuristic markers match domain-specific discipline rules — the same contract applied to the reference teacher and to external runs. These are marker-based harness scores; the headline-grade, multi-judge results live in RESULTS.md.

Chapter V · b

Head-to-head: where Sophia wins, and where it doesn’t

The leaderboards above are marker-based and saturate near 100% on easy cases. The charts below are the honest comparisons — drawn straight from the curated published results, each carrying its own gate, confidence interval, and caveat. Sophia’s edge is not raw accuracy; it is abstaining instead of fabricating. So these include the cases where the provenance gate loses — published in the same breath as the wins.

Read this honestly

Chapter VI

The agent: gated, multi-mode, human-approved

Sophia runs as an agent with several decision modes, each constrained by the same epistemic gate: a claim is verified against curated sources before it is accepted, and actions that change state require explicit human approval. The point this thesis defends is behavioural — grounded answers, abstention over fabrication, and a fail-closed posture. The internal architecture and operational wiring are intentionally kept out of the public site.

Chapter VII

Curated retrieval, no open-web grounding

Retrieval is restricted to a curated index (benchmark holdouts excluded) over the project's data, disputes, domain docs, reference answers, and examples — no open-web grounding. Whatever backend generates the answer, it passes the same epistemic gate before release. This keeps the system's claims tied to vetted sources rather than the live internet.

中文：線上 RAG 僅檢索審定語料（非開放網路），生成後再過史源關卡，確保主張繫於已審定來源而非即時網路。

Chapter VIII

AGI-candidate proof package

Sophia does not claim proven AGI. It publishes a stricter, auditable proof package: pre-registered thresholds, reproducible local benchmarks under a no-overclaim gate (multi-judge + CIs), a self-extending verifier flywheel that closes on held-out data, hidden-reviewer packs, long-horizon logs, a failure ledger, and a third-party replication checklist.

AGI not proven. Sophia is an AGI-candidate proof package for provenance-aware reasoning. Validated: 0% fabrication on traps; 12.5-point hallucination reduction at 0% false-positive cost.

Proof ladder

Required data before stronger AGI claims

External benchmark status

Proof package Failure ledger Web evidence manifest

中文：Sophia 目前是 AGI 候選證明包，而非已證明的 AGI。下一步需要盲測、消融、長時程任務、外部基準與第三方復現。

Chapter IX

Moral + epistemic Conscience Kernel

Sophia includes a deterministic, fail-closed candidate Conscience Kernel that governs AI output, tool calls, and trusted-memory writes. It returns one of seven verdicts — allow, revise, retrieve, clarify, escalate, abstain, or block — combining fact-checking, constitutional limits, and moral-uncertainty handling. The published view is this decision interface and its boundary; the internal composition lives in the repository, not on this site.

Boundary: this is moral + epistemic control infrastructure for an AGI-candidate system; it is not proof of AGI and does not change canClaimAGI = false.

Conscience artifact

中文：Conscience Kernel 是七路徑的道德與認知控制層：能決定允許、修改、查證、澄清、升級、棄答或封鎖；它是 AGI-candidate 基礎設施，不是已證明 AGI。

Chapter X

Ask Sophia (live council)

Query the agent through this panel when tools/serve_web.py is running. Otherwise the equivalent CLI command is shown for you to copy.

Your question or decision

Project, corpus, benchmark, and growth decisions.

References

Repository & citation

github.com/tomyimkc/sophia-agi Hugging Face dataset RESULTS.md

@misc{sophia2026,
  title  = {Sophia — the Wisdom Gate: Provenance-Aware Reasoning Corpus},
  author = {tomyimkc and Sophia contributors},
  year   = {2026},
  url    = {https://github.com/tomyimkc/sophia-agi}
}