Batch Chonk Verifier Technical Review

PR #21823 · Batched IPA verification for Chonk IVC proofs · C++ pipeline + TypeScript integration

The Problem

Each Chonk IVC proof requires an IPA check: an expensive multi-scalar multiplication (MSM) over the Grumpkin SRS. Verifying 30 proofs sequentially = 30 MSMs.

The Solution

IPA::batch_reduce_verify checks N proofs in 1 MSM via random linear combination. Bad proofs isolated via bisection. With real e2e proofs at default 6 cores: 96.5 proofs/sec (18 proofs in 187ms). Single proof: 127ms.

Interactive Performance Explorer

Simulates the C++ coordinator loop with work-stealing reduce. Approximate model calibrated from real benchmarks — see tables below for exact measured numbers.


Worker reducing proof IPA batch MSM Results emitted | dashed = first proof latency

Bisection

Benchmarks (Example E2E App Proofs)

Real e2e app proofs (ecdsar1+transfer_0_recursions+sponsored_fpc, 83KB, 2611 fields). Measured end-to-end through BatchChonkVerifier TS class including FIFO/subprocess overhead. Phase breakdown from C++ reduce_ms / ipa_ms in VerifyResult. 3-trial medians.

Phase Breakdown — real proofs, multiples-of-cores

coresbsroundswallreduceIPAoverheadtp
1111852116045.4/s
22122921911178.7/s
263285219512821.1/s
44115321538026.1/s
4123217215110455.3/s
4164236215110167.9/s
66112721376947.3/s
612215421367677.8/s
618318721378896.5/s
6244270213615089.0/s
6305308213716897.3/s
88111421306369.9/s
8162146212976109.3/s
8243260213516392.3/s

All times in ms. Reduce = 21ms/proof (constant). IPA = 37ms at 6 cores (constant across batch sizes — the MSM is over the same SRS regardless). Overhead = FIFO IPC + coordinator loop + msgpack. Default config (6 cores, bs=18) highlighted.

Key Insight — Why Batching Wins

IPA MSM is constant per batch (~37ms at 6 cores) regardless of batch size — the pippenger MSM operates over the same 32768-element SRS. So going from bs=6 to bs=18 costs only 2 extra reduce rounds (2×21ms = 42ms) for 3× the proofs. The overhead grows with batch size but throughput still improves.

At default 6 cores, bs=18: 187ms wall, 96.5 proofs/sec. Single proof: 127ms.

Key Insight

IPA MSM cost grows only sub-linearly with batch size (pippenger is O(n/log n)): at 6 cores, bs=1→40ms, bs=16→57ms. Meanwhile reduce parallelizes perfectly: 16 proofs on 6 cores = ceil(16/6)×15ms = 45ms. This is why batching wins: you pay ~42% more IPA for 16x the proofs.

Bisection Cost Model

When bad proofs fail during IPA (not reduce), bisection re-runs batch_check on subsets:

ScenarioExtra MSMsEst. cost
1 bad in 8~3+40-60ms
2 bad in 8~5+70-90ms
1 bad in 16~4+50-70ms

Each level: cost = IPA_BASE + IPA_PER_PROOF × subset. Reduce phase is never repeated.


Three-Phase Pipeline

Phase 1 — Parallel Reduce
N workers steal proofs via atomic index. Each runs reduce_to_ipa_claim (sumcheck, databus, Goblin). Produces IPA claim + transcript. Failures emitted immediately.

Phase 2 — Batch IPA
Single IPA::batch_reduce_verify over all passed claims. One MSM for the whole batch.

Phase 3 — Emit or Bisect
If batch passes: emit OK for all. If it fails: binary search to isolate bad proofs. Reduce phase is never repeated — only the cheap IPA check runs on subsets.

Soundness

1. batch_reduce_verify uses a random linear combination. False positive probability ~2-254 (Schwartz-Zippel over Grumpkin scalar field).

2. If left half passes → all failures in right. Skip right's batch_check, recurse directly. Sound because left passing means each left claim is individually valid (overwhelmingly).

3. Every proof lands in exactly one leaf of the bisection tree. Emitted as OK (at any level) or FAILED (base case, single proof). No drops, no duplicates.

ScenarioMSMs
All valid1
1 bad in NO(log N)
k bad in NO(k log N)
All badO(N log N)

Bisection (C++)

bisect(indices, depth):
  if |indices| == 1: emit FAILED; return

  left, right = split(indices)
  left_ok = batch_check(left)

  if left_ok:
    emit_ok(left)          // left clean
    bisect(right, depth+1) // skip right check!
  else:
    bisect(left, depth+1)
    if batch_check(right): emit_ok(right)
    else: bisect(right, depth+1)

Key: when left passes, right is not re-checked. Saves 1 MSM per level where left succeeds.

Node Integration (TS)

AztecNodeService
  peer: BatchChonkVerifier
    bb subprocess ↔ FIFO ↔ FifoFrameReader
  rpc:  QueuedIVCVerifier
    wraps BBCircuitVerifier
  test: TestCircuitVerifier (both)

Wire format: [4-byte BE length][msgpack] over named FIFO pipe. Results stream asynchronously — promises resolved via pendingRequests map keyed by request_id.

Lifecycle: new() loads protocol VKs. newForTesting() takes custom VKs. stop(): drain queue → flush bb → close FIFO → reject pending → destroy.


Configuration

Env VarDefault
BB_CHONK_VERIFY_MAX_BATCH16Peer batch cap*
BB_CHONK_VERIFY_BATCH_CONCURRENCY6Peer workers
BB_NUM_IVC_VERIFIERS8RPC concurrency
BB_IVC_CONCURRENCY1RPC threads

* Proofs verify immediately as they arrive. Max batch only caps accumulation while a batch is already processing.

Thread Model & Errors

ThreadRole
coordinatorCollect, dispatch, batch_check, bisect, emit
worker[N]reduce_to_ipa_claim (work-stealing)
writermsgpack → write_frame → FIFO

Errors: invalid VK → immediate FAILED. Reduce throws → caught, FAILED. Batch fails → bisect. FIFO ends → all pending rejected. Exit → sync unlink.


Appendix: Further Optimizations (Explored)

The current batch verifier batches only the final IPA check. The lw/opt-unified-batch branch explores three deeper optimizations that batch work within the reduce phase itself. These are not yet merged but show the path to further throughput gains.

1. Deferred Shplemini MSM

During ECCVM verification, each proof eagerly computes a ~148-point Grumpkin MSM (IPA::reduce_batch_opening_claim). Instead, fold the Shplemini commitments into a deferred claim via expand_claim_with_batch(). All deferred claims are finalized in one batched MSM at the end.

Saves: N × 148-point MSMs → 1 × (N×148)-point MSM

2. Batched BN254 Pairings

Each proof requires 3 pairing checks on BN254 (MegaZK PCS, Merge protocol, Translator). Currently verified eagerly per-proof. Instead, defer all 3×N pairing checks and aggregate into a single batched pairing via random linear combination.

Saves: 3N pairings → 1 batched pairing

3. Fused IPA Butterfly

Standard IPA verify builds a 32768-element s_vec per claim then MSMs it. Fuse the butterfly expansion with weighted accumulation: pre-scale initial seed by αi·a₀ᵢ, accumulate in-place. Eliminates per-claim s_vec storage entirely.

Measured: transcript processing 77ms → 38ms (-50%), batch_reduce_verify 174ms → 121ms (-31%) for 120 claims at 4 cores. Peak memory -120MB.

Unified Batch Verifier (Not Yet Merged)

The lw/opt-unified-batch branch combines all three optimizations into a unified pipeline where batch_check performs pairing aggregation + batched IPA in one step, and bisection handles any failure (pairing or IPA) using cached reduction results. Also batches the ~30 L/R transcript MSMs across all claims into a single (N×30+N)-point MSM via read_transcript_data_no_msm.

Status: prototyped and benchmarked on the branch. Reverted Shplemini MSM deferral (not clean enough yet). Butterfly fusion is the most immediately shippable piece.