Proof bundle format¶
A proof bundle is a zip containing four files. It is the externally-visible deliverable and the thing that must work end-to-end — design decisions protect its integrity first.
flowchart LR
subgraph Inputs
M[MerkleRoots + OTSProof]
S["Snapshots metadata<br/>+ HMAC commitments"]
K[Ed25519 signing key]
end
AB[assemble_bundle_data] --> CJ[canonical_bundle_json]
M --> AB
S --> AB
CJ --> JSON[bundle.json]
CJ --> SG[sign_canonical_bytes]
K --> SG --> SIG[bundle.sig.json]
AB --> PDF[render_bundle_pdf] --> PDF_F[bundle.pdf]
VER[verify_template.py] --> VERF[verify.py]
JSON --> ZIP[[bundle.zip]]
SIG --> ZIP
PDF_F --> ZIP
VERF --> ZIP
Contents¶
bundle.pdf¶
Human-readable cover document. Contains:
- Signed attestation of authorship (the text the author agreed to the first time they generated a bundle).
- Writing statistics: total captures, active days, peak day, final word count, sessions.
- Word-count-over-time chart — rendered by reportlab Drawing primitives directly in the PDF, no PNG intermediate.
- Cryptographic appendix: Merkle roots, OTS receipt fingerprints, Bitcoin block heights where known.
- One page of verification instructions for the publisher.
- A
bundle_identifier_hexfooter on every page — the content hash ofbundle.json.
Zero manuscript content.
bundle.json¶
The canonical, content-addressed payload. This is what verify.py actually reads. Deterministic JSON — sorted keys, UTF-8, no trailing whitespace, stable floating-point formatting — so the same inputs always produce the same bytes, and so bundle_identifier_hex (SHA-256 of this file) stably identifies a specific bundle.
Schema (illustrative — see backend/api/bundle.py and backend/api/schemas.py for the authoritative shape):
{
"format_version": 1,
"bundle_identifier_hex": "...", // SHA-256 of the canonical bytes
"generated_at": "2026-04-22T14:00:00Z",
"author": {
"email": "...",
"attestation_text": "...", // user-facing attestation
"attestation_signed_at": "..."
},
"writing_summary": {
"capture_count": 412,
"active_days": 91,
"peak_day": { "date": "...", "word_count": 8102 },
"final_word_count": 85000,
"session_count": 128
},
"merkle_roots": [
{
"computed_at": "2026-04-09T00:00:00Z",
"root_hash_hex": "...",
"leaves_hex": ["...", "...", "..."], // HMAC commitments in order
"reveals": [ // per-leaf reveal entries
{ // for the manuscript-match
"leaf_index": 0, // flow (omitted for legacy
"ciphertext_ref": "...", // v1-mac-key snapshots)
"scheme": "v2-per-leaf",
"reveal_key_hex": "..."
}
],
"ots_receipt_b64": "...", // DetachedTimestampFile
"bitcoin_block_height": 891234 // null if pending
},
...
],
"snapshots": [
{
"captured_at": "...",
"plaintext_hmac_hex": "...", // = Merkle leaf
"merkle_root_hash_hex": "...", // which day this save anchors to
"word_count": 1234,
"char_count": 7890,
"file_type": "md"
},
...
]
}
Note: no path, filename, or file contents. The only manuscript-derived value is plaintext_hmac_hex, which is an HMAC under a key the server never sees.
reveals and the manuscript-match flow¶
Each leaf's commitment is HMAC-SHA256(per_leaf_key, plaintext_utf8) where per_leaf_key = HKDF-SHA256(mac_key, info=b"blindproof/leaf/v2/" || ciphertext_ref). The master mac_key is derived from the author's passphrase and never leaves the client.
At bundle-generation time, the client walks its local store, derives the per-leaf key for every v2-per-leaf snapshot, and posts the resulting {ciphertext_ref: reveal_key_hex} map to the backend alongside the existing author_name / project_name / attestation_text fields. The backend embeds the reveals in bundle.json (so the Ed25519 signature covers them) keyed by the leaf's index in the parent root.
A reveal entry is one of:
- Per-leaf v2 reveal:
{leaf_index, ciphertext_ref, scheme: "v2-per-leaf", reveal_key_hex}.verify.pyuses this to recompute the HMAC for that single leaf when given a manuscript file via--manuscript. - Absent: snapshots captured under the legacy v1-mac-key scheme cannot be revealed — they were committed under the raw
mac_key, and revealing that key would expose every other leaf at once. They appear inleaves_hexso Merkle structure remains intact, but they cannot participate in manuscript-match. The verifier reports this clearly when no reveals are present.
The reveal-key derivation is one-way (HKDF) and per-leaf, so a publisher who is handed one reveal cannot test hypotheses against any other leaf — preserving the privacy of drafts the author later removed.
Canonical leaf order¶
Leaves within a Merkle root are ordered by (captured_at, id) — capture time first, registration id as a stable tiebreaker. This is the order the verifier expects in leaves_hex and the order reveals[i].leaf_index refers to. Both daily aggregation (aggregate_day in backend/api/merkle.py) and bundle assembly (assemble_bundle_data in backend/api/bundle.py) read it from the shared LEAF_ORDER_FIELDS / leaf_sort_key helpers in backend/api/merkle.py, so they cannot drift apart. Capture time is the right primary key — it is what authors and publishers see in the writing timeline — and the id tiebreaker keeps the order deterministic when two saves share a timestamp. Once a root is committed, its leaf order is fixed by the persisted root_hash; aggregate_day only operates on snapshots whose merkle_root is null, so a previously-anchored root is never re-permuted.
bundle.sig.json¶
Ed25519 signature over the canonical bundle.json bytes.
{
"alg": "Ed25519",
"signature_b64": "...",
"public_key_fingerprint": "..." // SHA-256 truncated; stable across bundles
}
The public-key fingerprint is embedded so the verifier can confirm the bundle was signed by BlindProof's production signing key (rather than a substituted one). The Ed25519 private key lives as the BLINDPROOF_SIGNING_KEY Fly secret; its public key is published alongside the docs.
verify.py¶
A PEP 723 single-file Python script with two external dependencies: opentimestamps-client (for the ots CLI Bitcoin check) and cryptography (for Ed25519 signature verification).
What it does:
- Loads
bundle.jsonandbundle.sig.jsonfrom the zip. - Verifies the Ed25519 signature in
bundle.sig.jsonagainst the canonical bytes ofbundle.json. - Recomputes
bundle_identifier_hexand confirms it matches SHA-256 of the canonical payload. - Compares the signing key's fingerprint against the pinned
EXPECTED_PUBLIC_KEY_FINGERPRINT(or--expect-fingerprintif passed). - For each
merkle_roots[i]: re-derivesroot_hash_hexfromleaves_hexusing the same Bitcoin-style SHA-256 tree the backend used. Refuses to continue if they don't match. - For each
ots_receipt_b64: writes the bytes to a temp file, invokesots verify -d <digest_hex> <tempfile>, and classifies the result tri-state — confirmed against Bitcoin (True); a genuine failure such as a digest mismatch or a known fake stub (False, which fails the run closed); or not confirmed here (None) when no local Bitcoin node is reachable or the receipt is still pending. OnlyFalsefails the overall verdict —Noneis surfaced as a note, becauseopentimestamps-clientcan confirm a Bitcoin attestation only against a local node (a node-less, explorer-based path is tracked as a follow-up). - If
--manuscript <path>is supplied: reads the manuscript with the same normalisation the client used (UTF-8 / BOM-tolerant / CRLF → LF), then for every reveal entry computesHMAC-SHA256(reveal_key, manuscript_bytes)and compares against the leaf at the givenleaf_index. Reports MATCHED if any reveal hits, NOT MATCHED otherwise. - Prints a summary: overall verdict, plus one line per check.
It is deliberately simple — a publisher should be able to audit the script in a reasonable afternoon. Don't expand its dependency list further without a comparable correctness justification.
Determinism and stability¶
The canonical JSON is deterministic: given the same backend state, regenerating a bundle for the same user produces the same bytes (same bundle_identifier_hex, same signature). This is important for a few reasons:
- A publisher can compare two bundles produced at different times and quickly see what changed (new snapshots appended; nothing rewritten).
- A republishing author can produce a fresh bundle before delivery without invalidating an earlier one.
- The fingerprint in the PDF footer lets non-technical readers match a PDF to its canonical payload at a glance.
If you change bundle.json's shape, bump format_version. verify.py must tolerate older formats for at least as long as any bundle produced under them might still be in circulation — which, given the durability promise, is "forever".
Stability caveats¶
- Ed25519 key rotation. When we rotate the signing key (V1), bundles signed by the old key must still verify. Plan is to publish a key-history document keyed by fingerprint;
verify.pyconsults it offline. Not yet implemented. - OTS receipt format evolution.
opentimestamps-clientis stable and backwards-compatible; we pin a lower bound and test against the latest. - Bitcoin. Bitcoin block headers are what OTS ultimately anchors to. The dependency surface here is "Bitcoin continues to exist and the public calendar network continues to operate". Both are outside our control and are what gives the bundle its durability.
See also¶
- Verifying a bundle — what
verify.pydoes step-by-step. - Backend internals — how the bundle's inputs are assembled.
- Design principles — why independent verifiability is non-negotiable.