Skip to content

Proof bundle format

A proof bundle is a zip containing four files. It is the externally-visible deliverable and the thing that must work end-to-end — design decisions protect its integrity first.

flowchart LR
    subgraph Inputs
        M[MerkleRoots + OTSProof]
        S["Snapshots metadata<br/>+ HMAC commitments"]
        K[Ed25519 signing key]
    end

    AB[assemble_bundle_data] --> CJ[canonical_bundle_json]
    M --> AB
    S --> AB
    CJ --> JSON[bundle.json]
    CJ --> SG[sign_canonical_bytes]
    K --> SG --> SIG[bundle.sig.json]
    AB --> PDF[render_bundle_pdf] --> PDF_F[bundle.pdf]
    VER[verify_template.py] --> VERF[verify.py]

    JSON --> ZIP[[bundle.zip]]
    SIG --> ZIP
    PDF_F --> ZIP
    VERF --> ZIP

Contents

bundle.pdf

Human-readable cover document. Contains:

  • Signed attestation of authorship (the text the author agreed to the first time they generated a bundle).
  • Writing statistics: total captures, active days, peak day, final word count, sessions.
  • Word-count-over-time chart — rendered by reportlab Drawing primitives directly in the PDF, no PNG intermediate.
  • Cryptographic appendix: Merkle roots, OTS receipt fingerprints, Bitcoin block heights where known.
  • One page of verification instructions for the publisher.
  • A bundle_identifier_hex footer on every page — the content hash of bundle.json.

Zero manuscript content.

bundle.json

The canonical, content-addressed payload. This is what verify.py actually reads. Deterministic JSON — sorted keys, UTF-8, no trailing whitespace, stable floating-point formatting — so the same inputs always produce the same bytes, and so bundle_identifier_hex (SHA-256 of this file) stably identifies a specific bundle.

Schema (illustrative — see backend/api/bundle.py and backend/api/schemas.py for the authoritative shape):

{
  "format_version": 1,
  "bundle_identifier_hex": "...",        // SHA-256 of the canonical bytes
  "generated_at": "2026-04-22T14:00:00Z",
  "author": {
    "email": "...",
    "attestation_text": "...",            // user-facing attestation
    "attestation_signed_at": "..."
  },
  "writing_summary": {
    "capture_count": 412,
    "active_days": 91,
    "peak_day": { "date": "...", "word_count": 8102 },
    "final_word_count": 85000,
    "session_count": 128
  },
  "merkle_roots": [
    {
      "computed_at": "2026-04-09T00:00:00Z",
      "root_hash_hex": "...",
      "leaves_hex": ["...", "...", "..."],   // HMAC commitments in order
      "reveals": [                            // per-leaf reveal entries
        {                                     // for the manuscript-match
          "leaf_index": 0,                    // flow (omitted for legacy
          "ciphertext_ref": "...",            // v1-mac-key snapshots)
          "scheme": "v2-per-leaf",
          "reveal_key_hex": "..."
        }
      ],
      "ots_receipt_b64": "...",              // DetachedTimestampFile
      "bitcoin_block_height": 891234          // null if pending
    },
    ...
  ],
  "snapshots": [
    {
      "captured_at": "...",
      "plaintext_hmac_hex": "...",        // = Merkle leaf
      "merkle_root_hash_hex": "...",      // which day this save anchors to
      "word_count": 1234,
      "char_count": 7890,
      "file_type": "md"
    },
    ...
  ]
}

Note: no path, filename, or file contents. The only manuscript-derived value is plaintext_hmac_hex, which is an HMAC under a key the server never sees.

reveals and the manuscript-match flow

Each leaf's commitment is HMAC-SHA256(per_leaf_key, plaintext_utf8) where per_leaf_key = HKDF-SHA256(mac_key, info=b"blindproof/leaf/v2/" || ciphertext_ref). The master mac_key is derived from the author's passphrase and never leaves the client.

At bundle-generation time, the client walks its local store, derives the per-leaf key for every v2-per-leaf snapshot, and posts the resulting {ciphertext_ref: reveal_key_hex} map to the backend alongside the existing author_name / project_name / attestation_text fields. The backend embeds the reveals in bundle.json (so the Ed25519 signature covers them) keyed by the leaf's index in the parent root.

A reveal entry is one of:

  • Per-leaf v2 reveal: {leaf_index, ciphertext_ref, scheme: "v2-per-leaf", reveal_key_hex}. verify.py uses this to recompute the HMAC for that single leaf when given a manuscript file via --manuscript.
  • Absent: snapshots captured under the legacy v1-mac-key scheme cannot be revealed — they were committed under the raw mac_key, and revealing that key would expose every other leaf at once. They appear in leaves_hex so Merkle structure remains intact, but they cannot participate in manuscript-match. The verifier reports this clearly when no reveals are present.

The reveal-key derivation is one-way (HKDF) and per-leaf, so a publisher who is handed one reveal cannot test hypotheses against any other leaf — preserving the privacy of drafts the author later removed.

Canonical leaf order

Leaves within a Merkle root are ordered by (captured_at, id) — capture time first, registration id as a stable tiebreaker. This is the order the verifier expects in leaves_hex and the order reveals[i].leaf_index refers to. Both daily aggregation (aggregate_day in backend/api/merkle.py) and bundle assembly (assemble_bundle_data in backend/api/bundle.py) read it from the shared LEAF_ORDER_FIELDS / leaf_sort_key helpers in backend/api/merkle.py, so they cannot drift apart. Capture time is the right primary key — it is what authors and publishers see in the writing timeline — and the id tiebreaker keeps the order deterministic when two saves share a timestamp. Once a root is committed, its leaf order is fixed by the persisted root_hash; aggregate_day only operates on snapshots whose merkle_root is null, so a previously-anchored root is never re-permuted.

bundle.sig.json

Ed25519 signature over the canonical bundle.json bytes.

{
  "alg": "Ed25519",
  "signature_b64": "...",
  "public_key_fingerprint": "..."    // SHA-256 truncated; stable across bundles
}

The public-key fingerprint is embedded so the verifier can confirm the bundle was signed by BlindProof's production signing key (rather than a substituted one). The Ed25519 private key lives as the BLINDPROOF_SIGNING_KEY Fly secret; its public key is published alongside the docs.

verify.py

A PEP 723 single-file Python script with two external dependencies: opentimestamps-client (for the ots CLI Bitcoin check) and cryptography (for Ed25519 signature verification).

What it does:

  1. Loads bundle.json and bundle.sig.json from the zip.
  2. Verifies the Ed25519 signature in bundle.sig.json against the canonical bytes of bundle.json.
  3. Recomputes bundle_identifier_hex and confirms it matches SHA-256 of the canonical payload.
  4. Compares the signing key's fingerprint against the pinned EXPECTED_PUBLIC_KEY_FINGERPRINT (or --expect-fingerprint if passed).
  5. For each merkle_roots[i]: re-derives root_hash_hex from leaves_hex using the same Bitcoin-style SHA-256 tree the backend used. Refuses to continue if they don't match.
  6. For each ots_receipt_b64: writes the bytes to a temp file, invokes ots verify -d <digest_hex> <tempfile>, and classifies the result tri-state — confirmed against Bitcoin (True); a genuine failure such as a digest mismatch or a known fake stub (False, which fails the run closed); or not confirmed here (None) when no local Bitcoin node is reachable or the receipt is still pending. Only False fails the overall verdict — None is surfaced as a note, because opentimestamps-client can confirm a Bitcoin attestation only against a local node (a node-less, explorer-based path is tracked as a follow-up).
  7. If --manuscript <path> is supplied: reads the manuscript with the same normalisation the client used (UTF-8 / BOM-tolerant / CRLF → LF), then for every reveal entry computes HMAC-SHA256(reveal_key, manuscript_bytes) and compares against the leaf at the given leaf_index. Reports MATCHED if any reveal hits, NOT MATCHED otherwise.
  8. Prints a summary: overall verdict, plus one line per check.

It is deliberately simple — a publisher should be able to audit the script in a reasonable afternoon. Don't expand its dependency list further without a comparable correctness justification.

Determinism and stability

The canonical JSON is deterministic: given the same backend state, regenerating a bundle for the same user produces the same bytes (same bundle_identifier_hex, same signature). This is important for a few reasons:

  • A publisher can compare two bundles produced at different times and quickly see what changed (new snapshots appended; nothing rewritten).
  • A republishing author can produce a fresh bundle before delivery without invalidating an earlier one.
  • The fingerprint in the PDF footer lets non-technical readers match a PDF to its canonical payload at a glance.

If you change bundle.json's shape, bump format_version. verify.py must tolerate older formats for at least as long as any bundle produced under them might still be in circulation — which, given the durability promise, is "forever".

Stability caveats

  • Ed25519 key rotation. When we rotate the signing key (V1), bundles signed by the old key must still verify. Plan is to publish a key-history document keyed by fingerprint; verify.py consults it offline. Not yet implemented.
  • OTS receipt format evolution. opentimestamps-client is stable and backwards-compatible; we pin a lower bound and test against the latest.
  • Bitcoin. Bitcoin block headers are what OTS ultimately anchors to. The dependency surface here is "Bitcoin continues to exist and the public calendar network continues to operate". Both are outside our control and are what gives the bundle its durability.

See also