Lean Manifests: Contracts with Evidence Levels

The Problem

Lean 4 is a powerful theorem prover. Two problems emerge as projects grow:

Evidence is invisible at the interface. In standard Lean, theorem foo : T := by sorry and theorem foo : T := by exact real_proof look identical to a consumer. Both are just theorem. There’s no way to tell the difference without opening the file. Every sorry looks the same as every other sorry — a deliberate assumption is indistinguishable from a forgotten scratch-hole.
Comprehension doesn’t scale. To understand what a module provides, readers (human or LLM) have had to read the proofs. Proof bodies tell you why something is true, but all you wanted to know was what, and how confident you should be.

Both problems have the same root cause: Lean doesn’t distinguish a module’s public claims from the private evidence backing each claim.

The Idea

Core principle: Separate the claims a module makes from the evidence backing each claim. Manifests are public and list every claim with its evidence level. Vocabulary (types mentioned in claims) is public. Proofs, implementation code, and test witnesses are hidden. A consumer reads the manifest plus vocabulary; they never need to open a proof file.

Each claim in a manifest uses one of five less-apologetic sorrys, each more evidence-laden than the last:

○ UnprovenConjecture    — sorry IS the theorem. Zero evidence.
◐ TestedConjecture      — sorry is the ∀ (at least one witness exists).
◑ DecomposedConjecture  — sorry is in the LEMMAS (all tested).
◕ DerivedConjecture     — sorry is in OTHER manifests (derivation is real).
● ProvenTheorem         — no sorry ANYWHERE.

Lean already has two ways to stand up a theorem you can’t fully prove: axiom (declared, honest, tracked) and theorem := by sorry (apologetic, unstructured). Lean Manifests add five gradations between them. The macro name appears in the source, so a reader scanning a module sees the evidence level at a glance.

Here’s what a manifest looks like:

-- Quicksort.lean (the manifest)
import LeanPractice.Code.Quicksort
import LeanPractice.Proofs.Quicksort

Wrap qs_mem_proof := @quicksort_mem

ProvenTheorem quicksort_sorted : ∀ (l : List Nat), Sorted (quicksort l)
ProvenTheorem qs_mem : ∀ (l : List Nat) (x : Nat), x ∈ quicksort l ↔ x ∈ l

Two lines. That's the entire public API of quicksort: it produces sorted output, and it preserves membership. The proof is 70+ lines in a separate file that consumers never need to open.

File Structure

Three parallel directories hold implementation details. Headers are the only .lean files at the top level (besides Basic.lean). An LLM running ls *.lean gets exactly the headers.

LeanPractice/
                -- Shared definitions, macros
            -- Header
        -- Header
            -- Header
           -- Header
  Code/
    Quicksort.lean          -- Implementation
    InsertionSort.lean
    MergeSort.lean
    BubbleSort.lean
  Proofs/
    Quicksort.lean          -- Proof
    InsertionSort.lean
    MergeSort.lean
  Tests/
    BubbleSort.lean         -- Compiler-verified tests
    Quicksort.lean
    InsertionSort.lean
    MergeSort.lean

Code/Module.lean — the implementation

Contains the function definition, helper lemmas needed by the proof, and a _spec definition stating what the module promises. This is the .c file.

Proofs/Module.lean — the proof

The actual proof, named <theorem>_proof. Only imports the Code file — no circular dependencies. Can be arbitrarily long and complex without affecting consumers.

Tests/Module.lean — compiler-verified tests

Hard-coded test cases as example statements using native_decide, plus named _test witnesses that the header macro requires. If the file compiles, every test passed:

-- Tests/BubbleSort.lean
-- If this file compiles, every test passed.

example : Sorted (bubbleSort []) := by native_decide
example : Sorted (bubbleSort [5, 4, 3, 2, 1]) := by native_decide
example : Sorted (bubbleSort [3, 1, 4, 1, 5, 9, 2, 6]) := by native_decide
example : Sorted (bubbleSort [1000, 1, 500, 2, 999]) := by native_decide

-- Named witness required by TestedConjecture in the header
def bubbleSort_sorted_test : Sorted (bubbleSort [5, 4, 3, 2, 1]) := by native_decide

Module.lean — the header

The header is the readable source of truth. A reader (human or LLM) should understand everything a module provides by reading the header alone.

Compare a fully-proven module with one that's still in progress:

Quicksort (fully proven)

ProvenTheorem quicksort_sorted
  : ∀ (l : List Nat), Sorted (quicksort l)

ProvenTheorem qs_mem
  : ∀ (l : List Nat) (x : Nat),
    x ∈ quicksort l ↔ x ∈ l

BubbleSort (tests only)

Signature bubbleSort
  : List Nat → List Nat

TestedConjecture bubbleSort_sorted
  : ∀ (l : List Nat), Sorted (bubbleSort l)

ProvenTheorem bs_pass_mem
  : ∀ (l : List Nat) (x : Nat),
    x ∈ bubblePass l ↔ x ∈ l

At a glance you can see: quicksort is fully proven, bubble sort's sortedness is only tested (not proven), but its pass-level membership is proven. The evidence level is in the macro name itself.

Anatomy of a header

A header has three sections:

Vocabulary — short definitions explaining the concepts used in theorem types. A reader who knows the vocabulary can read every theorem without opening another file.
Public theorems — only the theorems a consumer would use. Internal lemmas stay in Proofs/.
Signatures — type-checked function existence, useful when theorems don't fully cover a function.

Example: vocabulary as shared language

Define once for all sorting algorithms:

-- Vocabulary
-- ⟪tm⟫    = extract return value from TimeM computation
-- .time   = extract cost (comparison count)
-- IsSorted l = List.Pairwise (· ≤ ·) l
-- l₁ ~ l₂   = l₁ is a permutation of l₂

-- Now every sorting header is two lines per theorem:
ProvenTheorem mergeSort_correct
  : IsSorted ⟪mergeSort xs⟫ ∧ ⟪mergeSort xs⟫ ~ xs
ProvenTheorem mergeSort_time
  : (mergeSort xs).time ≤ n * ⌈log₂ n⌉

The vocabulary costs 4 lines and is shared across mergeSort, quickSort, heapSort, insertionSort. The more modules you add, the cheaper each header becomes.

Public vs. internal theorems

Not every theorem belongs in the header. A 207-line mergeSort module with 14 theorems reduces to 7 public theorems in the header. The other 7 are helpers (min_all_merge, merge_perm, timeMergeSortRec_le) that a consumer never invokes directly. The sizing rule: roughly 2 lines per public theorem plus 1–2 lines per vocabulary concept.

Maturity Stages

A module grows through three stages. The macros (ProvenTheorem, TestedConjecture) work at every stage — you get evidence tracking from day one. The file split is an orthogonal upgrade for visibility and enforcement.

Stage 1: Prototype (one file)

Everything in one file: definitions, proofs, tests, and macro declarations. Good for exploration. ProvenTheorem foo still fails if foo_proof doesn’t exist, even in the same file.

-- Quicksort.lean (prototype stage)
def quicksort : List Nat → List Nat := ...
theorem quicksort_sorted_proof : ∀ l, Sorted (quicksort l) := by ...
def quicksort_mem_test := show 3 ∈ quicksort [3,1,2] from ...

ProvenTheorem quicksort_sorted : ∀ l, Sorted (quicksort l)
TestedConjecture quicksort_mem : ∀ l x, x ∈ quicksort l ↔ x ∈ l

Stage 2: Separated (conventional)

Header, Code, Proofs, and Tests in separate files. Header imports all three. Consumers read only the header. But the header can see everything in Code — no enforcement that theorems only reference “vocabulary” types.

Stage 2.5: Organized (safe checkpoint)

Within each file, move all definitions to the top, all proofs to the bottom. Same file, same imports — just reordered. This is the safe checkpoint for migrating an existing codebase: the diff is a pure reorder, easy to review. For a PR, Stage 2.5 is PR #1 (reorganize), Stage 3 is PR #2 (split).

Stage 3: Enforced (full vocabulary control)

A new Defs/ directory holds the “vocabulary” — types and functions that appear in theorem signatures. The header imports Defs/ (not Code/) and auto-generated exports files (not raw Proofs/). This means:

Theorems can only reference names from Defs — compiler-enforced by the import structure
Proof files can’t smuggle definitions into the header
The exports files are auto-generated from the header’s ProvenTheorem declarations

LeanPractice/
                  -- Header: imports Defs + exports only
  Defs/
    Quicksort.lean                -- Vocabulary: types in theorem signatures
  Code/
    Quicksort.lean                -- Imports Defs, adds implementation
  Proofs/
    Quicksort.lean                -- Proofs
             -- Auto-generated: only _proof names
  Tests/
    Quicksort.lean                -- Tests + _test witnesses
             -- Auto-generated: only _test names

When to promote: 1→2 when the file gets long. 2→3 when you want to guarantee the header is self-contained — that a reader never needs to open Code/ or Proofs/ to understand a theorem.

The Header Macros

All macros are defined in DeanLean/Basic.lean — currently 263 lines for the entire system: a single register_option, two signature macros, five evidence macros, plus Wrap, FastHeader, VerifyAxiom, and ExternalTheorem. No compiler patches, no plugin.

Signature macros (type + totality)

Macro	Checks	Use when
`Signature foo : T`	Function exists, type matches, is total	Consumers need to know the calling convention
`PartialSignature foo : T`	Function exists, type matches, is partial	Function uses `partial def`

Signature is redundant when a ProvenTheorem already covers the full input space (totality is implied). Include it when the header has only tested/unproven conjectures, or when the theorem covers a restricted input space.

Spec-first design: Signature creates an axiom for missing functions

If foo doesn't exist yet, Signature foo : T doesn't error — it emits an axiom foo : T and an info message. You can write the entire header before a single function is implemented, then fill in the Code/ files later. Once the real def foo lands, the Signature line type-checks against it instead.

-- In the header, before any implementation exists:
Signature ghostFunction : Nat → Bool
-- info: Signature ghostFunction: not yet implemented, creating specification

-- Now `ghostFunction` exists as an axiom. Downstream code can call it.
-- When Code/X.lean later defines it, Signature verifies the type matches.

Evidence macros (behavioral claims)

Macro	Requires	Emits	Evidence
`ProvenTheorem foo : T`	`foo_proof` or `foo_derivation` in scope	`theorem foo : T := foo_proof` (or axiom, in fast mode)	● Full proof, all inputs
`DerivedConjecture foo : T`	`foo_derivation` in scope	Sorry theorem; info reports proven/total theorem-deps fraction + sorry deps	◕ Sorry is in OTHER HEADERS
`DecomposedConjecture foo : T`	`foo_derivation` + every sorry dep has `dep_test`	Sorry theorem; hard error if any lemma lacks `_test`	◑ Sorry is in the LEMMAS (all tested)
`TestedConjecture foo : T`	`foo_test` in scope	Sorry theorem; warns if witness looks vacuous	◐ Sorry is the ∀ (one witness)
`UnprovenConjecture foo : T`	nothing	`theorem foo : T := by sorry`	○ Sorry IS the theorem

Every evidence macro also supports redundant manifests: if foo already exists (e.g., it was declared in a more authoritative header that this one re-exports), the macro just type-checks the claim against the existing definition instead of redefining it. This lets you repeat the same ProvenTheorem in a facade and an aggregator header without collision.

Wrapping external theorems

Proof files and external libraries don't always follow the _proof naming convention. Wrap creates aliases:

-- The external theorem is called `quicksort_mem`, not `qs_mem_proof`
Wrap qs_mem_proof := @quicksort_mem

-- Now this works:
ProvenTheorem qs_mem : ∀ (l : List Nat) (x : Nat), x ∈ quicksort l ↔ x ∈ l

Wrap expands to noncomputable def. If the types don't match, Lean catches it at compile time.

VerifyAxiom and ExternalTheorem

Two companions to FastHeader and Wrap:

-- CI-only: confirm that an axiom in a FastHeader has a matching _proof/_derivation.
-- Fails loudly ("AXIOM UNVERIFIED") if no backing proof is reachable.
VerifyAxiom quicksort_sorted : ∀ (l : List Nat), Sorted (quicksort l)

-- Alias an external theorem as a named def, with a type check.
ExternalTheorem mergeSort_sorted := @Cslib.Algorithms.Lean.mergeSort_sorted
  : ∀ {α} [LinearOrder α] (xs : List α), IsSorted (mergeSort xs)

VerifyAxiom is the glue that keeps FastHeader honest: development uses axioms, CI runs a build that includes VerifyAxiom lines, and the build fails if any axiom has drifted from its proof. ExternalTheorem is the stricter cousin of Wrap — it re-exports under a new name and asserts the type, so a breaking change in the upstream library shows up here before it leaks into consumer headers.

Fast mode: `set_option levelized.fast true`

DeanLean/Basic.lean registers a compile-time option:

register_option levelized.fast : Bool := { defValue := false, descr := "Fast mode: ProvenTheorem emits axioms, skipping proof lookup" }

When set (in a set_option scope or globally via lake build -Klevelized.fast=true), ProvenTheorem foo : T emits axiom foo : T instead of looking up foo_proof. The header no longer needs to import the proof file at all. This is the same soundness story as FastHeader — just applied uniformly across every header in a build — with VerifyAxiom closing the gap in CI.

The Evidence Hierarchy

From weakest to strongest guarantee:

The key question for each claim: where is the sorry? Each level pushes it further out:

Signature / PartialSignature   type + totality, zero behavior claims

○ UnprovenConjecture           sorry IS the theorem
         │
◐ TestedConjecture             sorry is the ∀ (one witness exists)
         │
◑ DecomposedConjecture         sorry is in the LEMMAS (all tested)
         │
◕ DerivedConjecture            sorry is in OTHER HEADERS
         │
● ProvenTheorem                no sorry ANYWHERE

Each level is visible in the header at a glance. Each is strictly stronger than the level below, and all are compiler-enforced.

Every level except UnprovenConjecture is compiler-enforced:

Signature fails if the function has the wrong type or wrong totality. If the function doesn’t exist at all, it is created as an axiom (spec-first mode)
TestedConjecture foo fails if foo_test doesn’t exist; warns if the witness body uses absurd/False.elim/False.rec/Not.elim (likely vacuous). Add def foo_test_is_classical := () to suppress when the vacuous form is intentional
DecomposedConjecture foo throws a hard error if foo_derivation is missing, or if any sorry dependency of the derivation lacks its own _test witness. The error lists exactly which lemmas are untested
DerivedConjecture foo fails if foo_derivation doesn’t exist; on success it reports the proven/total theorem-dependency fraction and lists every sorry dep it found (via getUsedConstantsAsSet)
ProvenTheorem foo fails if neither foo_proof nor foo_derivation exists

DerivedConjecture fills the gap between “tested on examples” and “fully proven.” It represents a real proof conditional on unproven assumptions. The macro walks the derivation’s dependency tree at elaboration time, counts how many of the referenced theorems are sorry-free versus total, and prints something like:

info: DerivedConjecture distributivity: 7/9 theorem deps proven (77%)
  sorry deps: [mul_assoc, mul_comm]

When all dependencies get promoted, the message changes to “no sorry dependencies — consider promoting to ProvenTheorem.” Promotion is a one-keyword change: DerivedConjecture → ProvenTheorem, no file renaming, because ProvenTheorem accepts both _proof and _derivation names.

A promotion script (promote.sh) automatically rewrites headers when you add a proof file — changing TestedConjecture or DerivedConjecture to ProvenTheorem, updating imports, and adjusting everything consistently.

Teeth: the Trust Report Is Complete by Construction

The DerivedConjecture info message above looks like a trust report — “here’s what this derivation assumes.” For a while, it wasn’t actually a complete one. The referenced sorry deps could be stray theorem X := by sorry lines anywhere in the codebase, not just declared UnprovenConjectures in manifest files. A reader auditing “what does this derivation assume?” couldn’t distinguish a named, deliberate assumption from a forgotten scratch-hole.

The fix is an attribute-based enforcement. Every evidence macro now stamps its emitted declaration with @[manifest_entry]. The attribute is invisible in the manifest source — only the macro name appears, so UnprovenConjecture foo : P is still one clean line. Zero source-file bloat.

DerivedConjecture and DecomposedConjecture refuse to compile if any sorry-bearing dependency lacks the attribute:

error: DerivedConjecture bad_claim: sorry deps must be declared via
       manifest macros. These deps are stray sorries, not manifest
       entries:
         #[stray_sorry]
       Fix by either wrapping each in the appropriate manifest macro,
       or replacing with a real proof.

This turns the hierarchy from a convention into a contract. After the enforcement, every sorry dep of a DerivedConjecture is a named manifest entry with a known evidence level. The trust report is complete by construction. Auditing “what does this module assume?” reduces to reading the manifest macros it names — there is no off-the-record sorry, because any off-the-record sorry breaks the build.

The reframe that clarifies the whole design: Lean has sorry, which is anonymous and has no structure. The evidence hierarchy is five less-apologetic sorrys — each declared, tracked, and distinguishable at a glance by macro name. @[manifest_entry] is the type-system hook that distinguishes the well-behaved ladder from raw sorrys, and the enforcement makes the distinction a compile-time property rather than a social one.

The enforcement is binary: declared vs. hidden. It does not enforce monotonicity between levels — a DerivedConjecture can depend on an UnprovenConjecture, a TestedConjecture, or a ProvenTheorem. The hierarchy is ordered informally for human reading; a reader sees the trust report and decides whether the mix of levels is acceptable. The compiler only enforces “no hidden holes.”

Dependency Structure

Lean enforces acyclic imports — circular dependencies are a compile error. The directory layout reflects purpose, and levels emerge naturally from the import graph:

        Main.lean
           │
    ┌──────┴──────┐
    │   Headers    │   Quicksort.lean, MergeSort.lean, ...
    └──────┬──────┘
    ┌──────┴──────┐
    │  Code/ +    │   Code/Quicksort.lean
    │  Proofs/ +  │   Proofs/Quicksort.lean
    │  Tests/     │   Tests/BubbleSort.lean
    └──────┬──────┘
    ┌──────┴──────┐
    │  Basic.lean │   Shared definitions, macros
    └─────────────┘

Changing a header triggers recompilation of everything above it. Changing a proof triggers nothing — the .olean for the header doesn't change because proofs are erased at compile time.

Facades: Wrapping External Libraries

External libraries (Mathlib, Batteries, CSLib) name theorems however they like. The Wrap macro enables facade headers that re-export their API in our uniform format, without modifying their code.

Here's a real example — a facade over an external BST library:

-- Facades/BST.lean
-- The external library has 90 lines with deep namespaces.
-- This facade re-exports the public API.

import LeanPractice.External.BST

-- Function signatures
Signature External.Data.BST.Tree.insert : Tree Nat → Nat → Tree Nat
Signature External.Data.BST.Tree.mem    : Tree Nat → Nat → Bool
Signature External.Data.BST.Tree.toList  : Tree Nat → List Nat

-- Wrap their names to our convention
Wrap bst_insert_preserves_proof := @External.Data.BST.insert_preserves_bst
Wrap bst_toList_sorted_proof    := @External.Data.BST.toList_sorted

-- The public API: everything a consumer needs
ProvenTheorem bst_insert_preserves :
  ∀ (t : Tree Nat) (x : Nat), IsBST t → IsBST (t.insert x)

ProvenTheorem bst_toList_sorted :
  ∀ (t : Tree Nat), IsBST t → List.Pairwise (· < ·) t.toList

An LLM reads 10 lines instead of 90. The external library's code is untouched. If the library changes a theorem's type, the facade fails to compile — the lie is caught.

This stacks: facades import external code, headers import facades, consumers import headers. Each layer sees only the layer above. You can write facades for any Lean library and get uniform, LLM-friendly access to the entire ecosystem.

Fast Headers: Breaking the Recompilation Cascade

The problem

In standard Lean, changing a proof in a low-level file triggers recompilation up the import chain. Even though proofs are erased in .olean files, the build system tracks input hashes — if a dependency's .olean changes at all, everything above it rebuilds.

The solution: axiom headers

FastHeader declares theorems as axiom — no proof dependency, no sorry warning, no import of proof files:

-- Fast/Quicksort.lean
import LeanPractice.Code.Quicksort
-- NOTE: No import of Proofs/Quicksort!
-- Changing the proof rebuilds NOTHING above this file.

namespace Fast
FastHeader quicksort_sorted : ∀ (l : List Nat), Sorted (quicksort l)
FastHeader quicksort_mem : ∀ (l : List Nat) (x : Nat), x ∈ quicksort l ↔ x ∈ l
end Fast

Soundness

The gap (axiom with no proof import) is closed by separate verification files that are only built by CI, not during development:

-- Verify/Quicksort.lean (CI-only)
import LeanPractice.Proofs.Quicksort

example : ∀ (l : List Nat), Sorted (quicksort l) := quicksort_sorted_proof
example : ∀ (l : List Nat) (x : Nat), x ∈ quicksort l ↔ x ∈ l := quicksort_mem

If the axiom's type doesn't match the proof's type, the verification file fails to compile. CI catches it; development stays fast.

What rebuilds when?

Change	What rebuilds
Proof body (fix a tactic)	Nothing above the fast header
Theorem type (strengthen/weaken)	Header + direct dependents, then stops
Definition (change the function)	Everything that mentions it — full cascade

The C++ analogy

C++	Lean Manifests	Purpose
`.h` files	Fast axiom headers	Declarations only, fast to compile against
`.c` files	Proof files	Verified separately, don't block consumers
Linking	CI verification build	Checks everything matches

Lakos designed this for C++ compilation speed in the 1990s. The same structure solves Lean proof-checking speed for the same reason: separate the interface from the implementation, cascade only on interface changes.

Why Not Just Modules?

Lean 4 has a module system with namespaces, import control, and opaque definitions. A reasonable question: doesn't that already do everything Lean Manifests does?

Modules are the mechanism. Lean Manifests is a discipline built on top of that mechanism. Here's what modules alone don't give you:

Need	Modules alone	Lean Manifests
Evidence levels	A theorem either exists or doesn't. No way to say “tested but not proven.”	`ProvenTheorem` vs `TestedConjecture` vs `UnprovenConjecture` — visible at a glance, compiler-enforced
Witness enforcement	Modules can hide a proof body but can't require that a proof or test witness exists with a specific naming convention	`ProvenTheorem foo` fails unless `foo_proof` exists. `TestedConjecture foo` fails unless `foo_test` exists
Recompilation cascade	Changing a proof rebuilds everything downstream (the `.olean` hash changes even though proofs are erased)	`FastHeader` breaks the cascade with axioms verified separately in CI
LLM-friendly surface	`ls *.lean` shows every file. No distinction between interface and implementation	Headers at the top level, everything else in subdirectories. `ls *.lean` = the API
Progress tracking	`sorry` count. No granularity between “untested sorry” and “tested sorry”	`grep TestedConjecture .lean` vs `grep UnprovenConjecture .lean` — instantly see what's tested vs. asserted

The analogy: C has functions. That doesn’t mean every C codebase has a clean API. Lean Manifests are to Lean’s module system what API design discipline is to C’s function mechanism — an architectural pattern that the language enables but doesn’t enforce.

Wrapping Existing Libraries

Lean Manifests aren’t just for new code. You can add manifests to any existing Lean library without modifying its source. We demonstrated this on CSLib (130 files, 768 theorems, 20,837 lines) — reducing it to manifest summaries totaling 4,853 lines (77% reduction).

The Vocabulary macro

The Vocabulary macro is a define-or-verify pattern for Defs/ files:

-- If mergeSort already exists (from imported library): verify
-- If mergeSort doesn't exist: define it
Vocabulary mergeSort := @Cslib.Algorithms.Lean.TimeM.mergeSort
Vocabulary IsSorted  := @Cslib.Algorithms.Lean.TimeM.IsSorted

This means Defs/ files work whether the library is present or not. With CSLib: Vocabulary verifies the re-export. Without CSLib: Vocabulary creates standalone definitions. The header doesn't change either way.

Three-file wrapping pattern

               -- Header: ProvenTheorem statements (22 lines)
Defs/
  MergeSort.lean               -- Vocabulary: re-exports from CSLib (16 lines)
Proofs/
  MergeSort.lean               -- Bridge: _proof := @original (14 lines)
Original library: 207 lines, completely untouched

The full promise is header + Defs — a reader understands everything without opening the original source or the proofs bridge. The bridge file is pure plumbing: noncomputable def mergeSort_sorted_proof := @Cslib...mergeSort_sorted.

MergeSort header (complete)

-- Correctness
ProvenTheorem mergeSort_sorted  : ∀ {α} [LinearOrder α] (xs : List α), IsSorted ⟪mergeSort xs⟫
ProvenTheorem mergeSort_perm    : ∀ {α} [LinearOrder α] (xs : List α), ⟪mergeSort xs⟫.Perm xs
ProvenTheorem mergeSort_correct : ∀ {α} [LinearOrder α] (xs : List α), IsSorted ⟪mergeSort xs⟫ ∧ ⟪mergeSort xs⟫.Perm xs

-- Complexity (one tick per comparison)
ProvenTheorem mergeSort_time    : ∀ {α} [LinearOrder α] (xs : List α), (mergeSort xs).time ≤ xs.length * Nat.clog 2 xs.length

Every line is compiler-verified against CSLib. If CSLib changes a theorem's type, the header fails to build.

This approach scales to any library: Mathlib, Batteries, or your own internal code. The investment is ~30 lines per module (Defs + Proofs bridge). The payoff: a curated, compiler-checked, evidence-level-tagged API surface over any codebase.

Why This Matters for LLMs

Context windows are limited. To use a module, an LLM only needs its header — a few lines per function/theorem. The proofs (which can be hundreds of lines) stay out of context entirely.

What an LLM sees

When an LLM needs to use quicksort in a proof, it reads the header:

ProvenTheorem quicksort_sorted : ∀ (l : List Nat), Sorted (quicksort l)
ProvenTheorem qs_mem : ∀ (l : List Nat) (x : Nat), x ∈ quicksort l ↔ x ∈ l

Two lines instead of 150+. The LLM knows exactly what it can rely on and how to invoke it. With facades, this extends to the entire Lean ecosystem.

Headers are the API. Not just the "main" theorem — every theorem a consumer might need. If an LLM gets stuck because a lemma isn't in any header, that lemma belongs in a header.

Reflexive Application: a Manifest for the Macros Themselves

The evidence macros describe library code. We apply them to ourselves — the macros have their own manifest. This is a three-layer dependency chain that bottoms out in Lean’s own internals:

Leo's Lean (200K+ lines, trusted)
  ↑ 16 UnprovenConjecture  (DeanLean/Manifests/LeanEnvironment.lean)
Our macros (DeanLean/Basic.lean, 263 lines)
  ↑ 3 DerivedConjecture    (DeanLean/Manifests/MacroContracts.lean)
Library manifests, consumer code

DeanLean/Manifests/LeanEnvironment.lean — 16 UnprovenConjectures about Lean’s Environment API, grouped into sections: ConstantInfo shape, Environment.find? consistency, elaboration effects (elab_theorem_creates_thmInfo, elab_axiom_creates_axiomInfo, elab_preserves_others), sorryAx detection (sorry_proof_detected, real_proof_no_sorry, sorry_is_transitive), CommandElabM semantics, and one fundamental invariant (sorry_is_reliable_indicator). These are claims to Leo — a minimal, explicit slice of Lean that our macro soundness depends on. One item is upgraded from conjecture to ProvenTheorem already: error_bind_is_error, a one-line rfl.

DeanLean/Manifests/MacroContracts.lean — pure Prop-valued specs for the macros (ProvenTheoremSpec, TestedConjectureSpec, SignatureSpec, EvidenceOrderingInvariant), parameterized by Name rather than specific constants, then three DerivedConjectures asserting the macros satisfy those specs. Each derivation’s body explicitly names the LeanEnvironment conjectures it relies on:

theorem ProvenTheorem_is_correct_derivation :
    ∀ (env : Environment) (n : Name) (t : Expr),
    ProvenTheoremSpec env n t := by
  intro env n t h_pre
  have h1 := elab_theorem_creates_thmInfo
  have h2 := real_proof_no_sorry
  have h3 := find_name_consistent
  sorry

DerivedConjecture ProvenTheorem_is_correct :
    ∀ (env : Environment) (n : Name) (t : Expr), ProvenTheoremSpec env n t

The derivation bodies are intentionally unfinished (the haves bind the right claims but the combinator step ends in sorry). That’s what makes them DerivedConjecture rather than ProvenTheorem. The structural dependency on the 16 LeanEnvironment claims is already in place, so the macro’s auto-discovered sorry deps are exactly Leo’s axioms — not some accidental lemma. When Leo closes any of those 16 claims, the corresponding line in LeanEnvironment.lean flips from UnprovenConjecture to ProvenTheorem (or ExternalTheorem), and the percentage reported by our three DerivedConjectures goes up automatically.

DeanLean/Manifest.lean is the top-level index: it re-exports the two manifests plus Tests/ManifestTests.lean (witnesses for every macro on concrete inputs — proven theorems, tested conjectures, decomposed conjectures, Signature on existing and missing functions, redundancy, VerifyAxiom, and levelized.fast mode) and Tests/EnvironmentTests.lean (elab-time #check_is_theorem, #check_has_sorry, #check_no_sorry commands applied to the same examples). The tests file turns the verbal claim “ProvenTheorem produces a thmInfo with no sorryAx” into a machine-checked assertion at elaboration time.

The payoff: the question “are the evidence macros correct?” has a precise answer — the 3 contracts reduce entirely to 16 explicit claims about Lean’s environment API. Nothing informal sits between user code and the kernel.

Source Code

The complete working implementation is available on GitHub:

Repository: The full project with all sorting algorithms, proofs, tests, facades, fast headers, and the self-hosted manifest system is at github.com/deanpfoster/lean-manifests. The entire macro system is 263 lines in DeanLean/Basic.lean.