Commit-and-Forget Leaves Lives on the Table
Phase 1 scored each portfolio (T1, T2, B1, B2, W1, W2, Solar, combinations) as a single commitment: pick once, deploy for 10 years, count deaths. That framing assumes the CRF is known and fixed. Reality: over the first 3–5 years the program will observe actual mortality, refine the CRF posterior, and have every opportunity to reallocate the remaining budget.
In the Phase 2 Inv 21 hierarchical Bayesian analysis the posterior CRF CI brackets both Di (HR 1.073) and Krewski (HR 1.056) — so initial deployments under one CRF can be reallocated under the other as data rolls in. Inv 22 makes that information flow explicit.
The question: how much additional value does adaptive sequential allocation deliver over the Phase 1 commit-at-year-0 approach, and how sophisticated does the policy need to be to capture that value?
From One-Shot to Bayes-Optimal
Five policy fidelity levels. Each one drives a 10-year, $4B simulated deployment with a mixture-prior true CRF (35% Di / 35% Krewski / 30% Inv 21 posterior), noisy lagged mortality observations, and conjugate Bayesian updating.
200 Monte Carlo trajectories per policy. Discount 3%. Observations lagged 2 years with σ = 35% + 10 deaths. Belief state: Gaussian over log-CRF β with conjugate update.
The Gap is 19%
| Policy | Mean Deaths Avoided | 95% CI | Deaths per $B | vs L1 One-shot |
|---|---|---|---|---|
| L1 One-shot (Phase 1) | 651 | [531, 2133] | 163 | +0.0% |
| L2 Two-stage | 702 | [572, 2302] | 175 | +7.8% |
| L3 Rolling horizon | 743 | [621, 2254] | 186 | +14.1% |
| L4 POMDP belief-state | 735 | [587, 2560] | 184 | +12.9% |
| L5 BO-optimal | 777 | [621, 2677] | 194 | +19.2% |
Mean + 95% CI across 200 MC trajectories; each trajectory samples a different true CRF. Deaths per $B computed from mean lifetime discounted spend ($4B).
Learning Beats Committing
L1 locks in a single allocation that splits the difference between Di and Krewski regimes. If the true CRF ends up Krewski-like, it under-allocates to transport (T2); if Di-like, it over-allocates. Either way, the fixed schedule is sub-optimal in hindsight.
L2 (two-stage) captures the simplest version of the gain: commit 50% early, wait 5 years for mortality to accumulate, and redirect the remaining budget. That alone adds 7.8% deaths avoided. L3 rolling-horizon refits annually, which adds another 6.3 pp because the early-year posterior collapses onto the true regime faster than 5-year batching.
L4 makes the belief state explicit (three discrete regime hypotheses + soft Bayesian weighting) and value-iterates the best response per belief. That nearly matches L3. L5 then wraps the policy in a 2-parameter family (aggressiveness, risk-aversion) and runs multi-fidelity Bayesian optimization to find the corner of the Pareto frontier — dominating all lower levels on expected deaths avoided.
The Policy Space
To compare one-shot commitments against adaptive schedules on equal footing, each fidelity level is expressed as a decision rule mapping the current evidence state to next-year spend — that is the object the sequential ladder is grading.
Each policy is a function π(state) → budget allocation. For L1–L4, π
is hand-written; for L5, π is parameterized by (aggressiveness, risk-aversion)
and the two parameters are optimized via Kennedy–O’Hagan MFGP over 30 evaluations
of a cheap surrogate (50 MC trajectories) and 5 evaluations of the true simulator
(200 MC trajectories).
Observations are noisy lagged mortality signals: given a fraction of option o deployed and a true CRF β, the annual deaths avoided is linear-interpolated between the Di and Krewski anchors published in Phase 1. Observations are lagged 2 years (epi surveillance) with Gaussian noise scaling with the signal. Bayesian updates use a conjugate normal-normal model on log-CRF.
Sources: Bellman 1957 (DP); Kaelbling et al. 1998 (POMDP); Kennedy & O'Hagan 2000 (MFGP); Tange 2018 (BO in public health). The Inv 21 hierarchical-Bayes posterior over CA-pooled CRF motivates the regime-belief formulation here.
Implication for the portfolio. This investigation says the Phase 1 deliverable should not be a single portfolio pick but a policy — a contingent plan that reacts to 3-year mortality observations. The 19% lives-avoided improvement at a $4B 10-year budget is worth $1.5B against a $0 incremental cost (the policy itself is just a decision rule).