PR #21823 · Batched IPA verification for Chonk IVC proofs · C++ pipeline + TypeScript integration
Each Chonk IVC proof requires an IPA check: an expensive multi-scalar multiplication (MSM) over the Grumpkin SRS. Verifying 30 proofs sequentially = 30 MSMs.
IPA::batch_reduce_verify checks N proofs in 1 MSM via random linear combination. Bad proofs isolated via bisection. With real e2e proofs at default 6 cores: 96.5 proofs/sec (18 proofs in 187ms). Single proof: 127ms.
Simulates the C++ coordinator loop with work-stealing reduce. Approximate model calibrated from real benchmarks — see tables below for exact measured numbers.
Real e2e app proofs (ecdsar1+transfer_0_recursions+sponsored_fpc, 83KB, 2611 fields). Measured end-to-end through BatchChonkVerifier TS class including FIFO/subprocess overhead. Phase breakdown from C++ reduce_ms / ipa_ms in VerifyResult. 3-trial medians.
| cores | bs | rounds | wall | reduce | IPA | overhead | tp |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 185 | 21 | 160 | 4 | 5.4/s |
| 2 | 2 | 1 | 229 | 21 | 91 | 117 | 8.7/s |
| 2 | 6 | 3 | 285 | 21 | 95 | 128 | 21.1/s |
| 4 | 4 | 1 | 153 | 21 | 53 | 80 | 26.1/s |
| 4 | 12 | 3 | 217 | 21 | 51 | 104 | 55.3/s |
| 4 | 16 | 4 | 236 | 21 | 51 | 101 | 67.9/s |
| 6 | 6 | 1 | 127 | 21 | 37 | 69 | 47.3/s |
| 6 | 12 | 2 | 154 | 21 | 36 | 76 | 77.8/s |
| 6 | 18 | 3 | 187 | 21 | 37 | 88 | 96.5/s |
| 6 | 24 | 4 | 270 | 21 | 36 | 150 | 89.0/s |
| 6 | 30 | 5 | 308 | 21 | 37 | 168 | 97.3/s |
| 8 | 8 | 1 | 114 | 21 | 30 | 63 | 69.9/s |
| 8 | 16 | 2 | 146 | 21 | 29 | 76 | 109.3/s |
| 8 | 24 | 3 | 260 | 21 | 35 | 163 | 92.3/s |
All times in ms. Reduce = 21ms/proof (constant). IPA = 37ms at 6 cores (constant across batch sizes — the MSM is over the same SRS regardless). Overhead = FIFO IPC + coordinator loop + msgpack. Default config (6 cores, bs=18) highlighted.
IPA MSM is constant per batch (~37ms at 6 cores) regardless of batch size — the pippenger MSM operates over the same 32768-element SRS. So going from bs=6 to bs=18 costs only 2 extra reduce rounds (2×21ms = 42ms) for 3× the proofs. The overhead grows with batch size but throughput still improves.
At default 6 cores, bs=18: 187ms wall, 96.5 proofs/sec. Single proof: 127ms.
IPA MSM cost grows only sub-linearly with batch size (pippenger is O(n/log n)): at 6 cores, bs=1→40ms, bs=16→57ms. Meanwhile reduce parallelizes perfectly: 16 proofs on 6 cores = ceil(16/6)×15ms = 45ms. This is why batching wins: you pay ~42% more IPA for 16x the proofs.
When bad proofs fail during IPA (not reduce), bisection re-runs batch_check on subsets:
| Scenario | Extra MSMs | Est. cost |
|---|---|---|
| 1 bad in 8 | ~3 | +40-60ms |
| 2 bad in 8 | ~5 | +70-90ms |
| 1 bad in 16 | ~4 | +50-70ms |
Each level: cost = IPA_BASE + IPA_PER_PROOF × subset. Reduce phase is never repeated.
Phase 1 — Parallel Reduce
N workers steal proofs via atomic index. Each runs reduce_to_ipa_claim (sumcheck, databus, Goblin). Produces IPA claim + transcript. Failures emitted immediately.
Phase 2 — Batch IPA
Single IPA::batch_reduce_verify over all passed claims. One MSM for the whole batch.
Phase 3 — Emit or Bisect
If batch passes: emit OK for all. If it fails: binary search to isolate bad proofs. Reduce phase is never repeated — only the cheap IPA check runs on subsets.
1. batch_reduce_verify uses a random linear combination. False positive probability ~2-254 (Schwartz-Zippel over Grumpkin scalar field).
2. If left half passes → all failures in right. Skip right's batch_check, recurse directly. Sound because left passing means each left claim is individually valid (overwhelmingly).
3. Every proof lands in exactly one leaf of the bisection tree. Emitted as OK (at any level) or FAILED (base case, single proof). No drops, no duplicates.
| Scenario | MSMs |
|---|---|
| All valid | 1 |
| 1 bad in N | O(log N) |
| k bad in N | O(k log N) |
| All bad | O(N log N) |
bisect(indices, depth): if |indices| == 1: emit FAILED; return left, right = split(indices) left_ok = batch_check(left) if left_ok: emit_ok(left) // left clean bisect(right, depth+1) // skip right check! else: bisect(left, depth+1) if batch_check(right): emit_ok(right) else: bisect(right, depth+1)
Key: when left passes, right is not re-checked. Saves 1 MSM per level where left succeeds.
AztecNodeService peer: BatchChonkVerifier bb subprocess ↔ FIFO ↔ FifoFrameReader rpc: QueuedIVCVerifier wraps BBCircuitVerifier test: TestCircuitVerifier (both)
Wire format: [4-byte BE length][msgpack] over named FIFO pipe. Results stream asynchronously — promises resolved via pendingRequests map keyed by request_id.
Lifecycle: new() loads protocol VKs. newForTesting() takes custom VKs. stop(): drain queue → flush bb → close FIFO → reject pending → destroy.
| Env Var | Default | |
|---|---|---|
BB_CHONK_VERIFY_MAX_BATCH | 16 | Peer batch cap* |
BB_CHONK_VERIFY_BATCH_CONCURRENCY | 6 | Peer workers |
BB_NUM_IVC_VERIFIERS | 8 | RPC concurrency |
BB_IVC_CONCURRENCY | 1 | RPC threads |
* Proofs verify immediately as they arrive. Max batch only caps accumulation while a batch is already processing.
| Thread | Role |
|---|---|
| coordinator | Collect, dispatch, batch_check, bisect, emit |
| worker[N] | reduce_to_ipa_claim (work-stealing) |
| writer | msgpack → write_frame → FIFO |
Errors: invalid VK → immediate FAILED. Reduce throws → caught, FAILED. Batch fails → bisect. FIFO ends → all pending rejected. Exit → sync unlink.
The current batch verifier batches only the final IPA check. The lw/opt-unified-batch branch explores three deeper optimizations that batch work within the reduce phase itself. These are not yet merged but show the path to further throughput gains.
During ECCVM verification, each proof eagerly computes a ~148-point Grumpkin MSM (IPA::reduce_batch_opening_claim). Instead, fold the Shplemini commitments into a deferred claim via expand_claim_with_batch(). All deferred claims are finalized in one batched MSM at the end.
Saves: N × 148-point MSMs → 1 × (N×148)-point MSM
Each proof requires 3 pairing checks on BN254 (MegaZK PCS, Merge protocol, Translator). Currently verified eagerly per-proof. Instead, defer all 3×N pairing checks and aggregate into a single batched pairing via random linear combination.
Saves: 3N pairings → 1 batched pairing
Standard IPA verify builds a 32768-element s_vec per claim then MSMs it. Fuse the butterfly expansion with weighted accumulation: pre-scale initial seed by αi·a₀ᵢ, accumulate in-place. Eliminates per-claim s_vec storage entirely.
Measured: transcript processing 77ms → 38ms (-50%), batch_reduce_verify 174ms → 121ms (-31%) for 120 claims at 4 cores. Peak memory -120MB.
The lw/opt-unified-batch branch combines all three optimizations into a unified pipeline where batch_check performs pairing aggregation + batched IPA in one step, and bisection handles any failure (pairing or IPA) using cached reduction results. Also batches the ~30 L/R transcript MSMs across all claims into a single (N×30+N)-point MSM via read_transcript_data_no_msm.
Status: prototyped and benchmarked on the branch. Reverted Shplemini MSM deferral (not clean enough yet). Butterfly fusion is the most immediately shippable piece.