Batch Chonk Verifier Technical Review

PR #21823 · Batched IPA verification for Chonk IVC proofs · C++ pipeline + TypeScript integration

The Problem

Each Chonk IVC proof requires an IPA check: an expensive multi-scalar multiplication (MSM) over the Grumpkin SRS. Verifying 30 proofs sequentially = 30 MSMs.

The Solution

IPA::batch_reduce_verify checks N proofs in 1 MSM via random linear combination. Bad proofs isolated via bisection. With real e2e proofs at default 6 cores: 96.5 proofs/sec (18 proofs in 187ms). Single proof: 127ms.

Interactive Performance Explorer

Simulates the C++ coordinator loop with work-stealing reduce. Approximate model calibrated from real benchmarks — see tables below for exact measured numbers.

Worker Cores

Batch Size (multiples of cores)

Queue Depth

■ Worker reducing proof ■ IPA batch MSM ■ Results emitted | dashed = first proof latency

Bisection

Proofs 8 Bad 1

Benchmarks (Example E2E App Proofs)

Real e2e app proofs (ecdsar1+transfer_0_recursions+sponsored_fpc, 83KB, 2611 fields). Measured end-to-end through BatchChonkVerifier TS class including FIFO/subprocess overhead. Phase breakdown from C++ reduce_ms / ipa_ms in VerifyResult. 3-trial medians.

Phase Breakdown — real proofs, multiples-of-cores

cores	bs	rounds	wall	reduce	IPA	overhead	tp
1	1	1	185	21	160	4	5.4/s
2	2	1	229	21	91	117	8.7/s
2	6	3	285	21	95	128	21.1/s
4	4	1	153	21	53	80	26.1/s
4	12	3	217	21	51	104	55.3/s
4	16	4	236	21	51	101	67.9/s
6	6	1	127	21	37	69	47.3/s
6	12	2	154	21	36	76	77.8/s
6	18	3	187	21	37	88	96.5/s
6	24	4	270	21	36	150	89.0/s
6	30	5	308	21	37	168	97.3/s
8	8	1	114	21	30	63	69.9/s
8	16	2	146	21	29	76	109.3/s
8	24	3	260	21	35	163	92.3/s

All times in ms. Reduce = 21ms/proof (constant). IPA = 37ms at 6 cores (constant across batch sizes — the MSM is over the same SRS regardless). Overhead = FIFO IPC + coordinator loop + msgpack. Default config (6 cores, bs=18) highlighted.

Key Insight — Why Batching Wins

IPA MSM is constant per batch (~37ms at 6 cores) regardless of batch size — the pippenger MSM operates over the same 32768-element SRS. So going from bs=6 to bs=18 costs only 2 extra reduce rounds (2×21ms = 42ms) for 3× the proofs. The overhead grows with batch size but throughput still improves.

At default 6 cores, bs=18: 187ms wall, 96.5 proofs/sec. Single proof: 127ms.

Key Insight

IPA MSM cost grows only sub-linearly with batch size (pippenger is O(n/log n)): at 6 cores, bs=1→40ms, bs=16→57ms. Meanwhile reduce parallelizes perfectly: 16 proofs on 6 cores = ceil(16/6)×15ms = 45ms. This is why batching wins: you pay ~42% more IPA for 16x the proofs.

Bisection Cost Model

When bad proofs fail during IPA (not reduce), bisection re-runs batch_check on subsets:

Scenario	Extra MSMs	Est. cost
1 bad in 8	~3	+40-60ms
2 bad in 8	~5	+70-90ms
1 bad in 16	~4	+50-70ms

Each level: cost = IPA_BASE + IPA_PER_PROOF × subset. Reduce phase is never repeated.

Three-Phase Pipeline

Phase 1 — Parallel Reduce
N workers steal proofs via atomic index. Each runs reduce_to_ipa_claim (sumcheck, databus, Goblin). Produces IPA claim + transcript. Failures emitted immediately.

Phase 2 — Batch IPA
Single IPA::batch_reduce_verify over all passed claims. One MSM for the whole batch.

Phase 3 — Emit or Bisect
If batch passes: emit OK for all. If it fails: binary search to isolate bad proofs. Reduce phase is never repeated — only the cheap IPA check runs on subsets.

Soundness

1. batch_reduce_verify uses a random linear combination. False positive probability ~2^-254 (Schwartz-Zippel over Grumpkin scalar field).

2. If left half passes → all failures in right. Skip right's batch_check, recurse directly. Sound because left passing means each left claim is individually valid (overwhelmingly).

3. Every proof lands in exactly one leaf of the bisection tree. Emitted as OK (at any level) or FAILED (base case, single proof). No drops, no duplicates.

Scenario	MSMs
All valid	1
1 bad in N	O(log N)
k bad in N	O(k log N)
All bad	O(N log N)

Bisection (C++)

bisect(indices, depth):
  if |indices| == 1: emit FAILED; return

  left, right = split(indices)
  left_ok = batch_check(left)

  if left_ok:
    emit_ok(left)          // left clean
    bisect(right, depth+1) // skip right check!
  else:
    bisect(left, depth+1)
    if batch_check(right): emit_ok(right)
    else: bisect(right, depth+1)

Key: when left passes, right is not re-checked. Saves 1 MSM per level where left succeeds.

Node Integration (TS)

AztecNodeService
  peer: BatchChonkVerifier
    bb subprocess ↔ FIFO ↔ FifoFrameReader
  rpc:  QueuedIVCVerifier
    wraps BBCircuitVerifier
  test: TestCircuitVerifier (both)

Wire format: [4-byte BE length][msgpack] over named FIFO pipe. Results stream asynchronously — promises resolved via pendingRequests map keyed by request_id.

Lifecycle: new() loads protocol VKs. newForTesting() takes custom VKs. stop(): drain queue → flush bb → close FIFO → reject pending → destroy.

Configuration

Env Var	Default
`BB_CHONK_VERIFY_MAX_BATCH`	16	Peer batch cap*
`BB_CHONK_VERIFY_BATCH_CONCURRENCY`	6	Peer workers
`BB_NUM_IVC_VERIFIERS`	8	RPC concurrency
`BB_IVC_CONCURRENCY`	1	RPC threads

* Proofs verify immediately as they arrive. Max batch only caps accumulation while a batch is already processing.

Thread Model & Errors

Thread	Role
coordinator	Collect, dispatch, batch_check, bisect, emit
worker[N]	reduce_to_ipa_claim (work-stealing)
writer	msgpack → write_frame → FIFO

Errors: invalid VK → immediate FAILED. Reduce throws → caught, FAILED. Batch fails → bisect. FIFO ends → all pending rejected. Exit → sync unlink.

Appendix: Further Optimizations (Explored)

The current batch verifier batches only the final IPA check. The lw/opt-unified-batch branch explores three deeper optimizations that batch work within the reduce phase itself. These are not yet merged but show the path to further throughput gains.

1. Deferred Shplemini MSM

During ECCVM verification, each proof eagerly computes a ~148-point Grumpkin MSM (IPA::reduce_batch_opening_claim). Instead, fold the Shplemini commitments into a deferred claim via expand_claim_with_batch(). All deferred claims are finalized in one batched MSM at the end.

Saves: N × 148-point MSMs → 1 × (N×148)-point MSM

2. Batched BN254 Pairings

Each proof requires 3 pairing checks on BN254 (MegaZK PCS, Merge protocol, Translator). Currently verified eagerly per-proof. Instead, defer all 3×N pairing checks and aggregate into a single batched pairing via random linear combination.

Saves: 3N pairings → 1 batched pairing

3. Fused IPA Butterfly

Standard IPA verify builds a 32768-element s_vec per claim then MSMs it. Fuse the butterfly expansion with weighted accumulation: pre-scale initial seed by αⁱ·a₀ᵢ, accumulate in-place. Eliminates per-claim s_vec storage entirely.

Measured: transcript processing 77ms → 38ms (-50%), batch_reduce_verify 174ms → 121ms (-31%) for 120 claims at 4 cores. Peak memory -120MB.

Unified Batch Verifier (Not Yet Merged)

The lw/opt-unified-batch branch combines all three optimizations into a unified pipeline where batch_check performs pairing aggregation + batched IPA in one step, and bisection handles any failure (pairing or IPA) using cached reduction results. Also batches the ~30 L/R transcript MSMs across all claims into a single (N×30+N)-point MSM via read_transcript_data_no_msm.

Status: prototyped and benchmarked on the branch. Reverted Shplemini MSM deferral (not clean enough yet). Butterfly fusion is the most immediately shippable piece.