How much simulation budget does CEC need to pick the right 5 monitor sites?
Inv 27 found 5 high-value monitor sites using GP-UCB with full L4 scoring on every candidate. But a full L4 evaluation requires 1 year of co-located sensors (~$1M each). For 15 candidates, that's $15M in evaluation overhead just to pick where the real $2.5M deployment should go.
This investigation asks: can a multi-fidelity search recover the same decision at a fraction of the evaluation cost? BOCA (Kandasamy et al. 2017) is designed exactly for this — at each step the algorithm picks a (site, fidelity) pair to maximize information per unit cost.
The implementation here is cost-weighted successive halving with UCB tie-breaking, inspired by that framing rather than a literal BOCA port: a candidate must accumulate lower-fidelity evidence before a higher fidelity unlocks, and the acquisition score is (info_gain + UCB-weight) ÷ √cost. No joint GP over (site, fidelity), no explicit bias-budget term, no kernel ridge between fidelities — the cost-aware acquisition and successive-halving gate carry the practical behavior; the full BOCA machinery is the natural next step.
Four proxies of EVSI — one decision
Which fidelities got used?
The rule burned almost all of its budget at L1 (cheap gap-score screening), then escalated to L3 (climate-signal UCB) only for top contenders. It never needed L2 or L4 — the L1 prior was informative enough to prune the obvious non-contenders, and L3 was sufficient to rank the top contested candidates.
In problems where L1 has larger bias against the top fidelity, the gating rule would force more L2/L4 evaluations. The cost split is adaptive in that sense, but it is governed by the hand-set unlock thresholds — not by BOCA's optimal bias-budget calculation.
3× cheaper, one site missed
What the "correct" answer looks like
| Rank | Site | Gap score | True EVSI | BOCA picked? | Full-sim picked? |
|---|---|---|---|---|---|
| 1 | sjv_merced | 0.826 | 0.6573 | ✓ | ✓ |
| 2 | la_basin_E | 1.000 | 0.6494 | ✓ | ✓ |
| 3 | sierra_plumas | 0.581 | 0.5356 | — | ✓ |
| 4 | sjv_stanislaus | 0.737 | 0.5084 | ✓ | ✓ |
| 5 | la_basin_S | 0.653 | 0.4612 | ✓ | ✓ |
Oracle = ranking using the full L4 scoring function applied to every candidate, with no noise. In production we don't have the oracle — that's the whole point of the search.
How much pilot budget to reserve
Recommendation: Use the BOCA-inspired rule for monitor-network design. It consumes 3.3× less simulation budget than single-fidelity GP-UCB and recovers 4 of 5 oracle-optimal sites. Replace the 1-year pilot budget (~$1M per evaluation via co-located deployment) with ~$0.1M of multi-fidelity proxy runs plus ~$0.2M of targeted full-sim on the final 2-3 candidates.
For the CEC monitor-network RFP, run a two-stage pilot: screen all 15 candidates using cheap proxies ($0.15M total), then run targeted 1-year co-located deployments on the top 3–4 contenders ($3–4M). Total evaluation cost: ~$4M instead of $15M; quality cost is 4/5 oracle recovery — one bias-driven miss (sierra_plumas).
What this rule still misses
- Honest label: implementation is cost-weighted successive halving with UCB tie-breaking, NOT canonical BOCA. No joint GP over (site, fidelity), no BOCA bias-budget term, no kernel ridge regression between fidelities. Kandasamy et al. 2017 is the inspiration, not the algorithm.
- Fidelity costs are nominal units; real-world ratio depends on CEC monitor procurement timelines.
- Fidelity-gating thresholds (need N_k lower-fidelity evals before fid k+1 unlocks) are hand-tuned, not derived from a bias-budget optimization.
- Assumes noise model is known per fidelity; a GP-UCB extension with heteroscedastic noise is the natural next step.