EVPPI via Strong, Oakley & Brennan 2014 GAM Regression — Where the Value of Information Lives

The Question

EVPI is one number. EVPPI is a priority list.

CEC's decision is whether to accelerate transport electrification (T2 at $2B), keep the status quo (T1), delay (T3), focus on equity (T4), or heavy-duty-first (T5). Inv 02 ran 10,000 shared-draw Monte Carlo replications and found:

E[NB] is maximized by T2_accelerated ($14.14B mean net benefit), but T4_equity is statistically indistinguishable ($14.12B).
EVPI = $0.229B — perfect information would change the optimal scenario in ≈20% of realizations.
Group-level EVPPI: CRF group $0.116B, monetization $0.092B, emissions $0.002B.

What Inv 02 couldn't answer: which CRF parameter? Which monetization input? Which emission sector? Strong, Oakley & Brennan's 2014 nonparametric regression-based EVPPI estimator lets us decompose group-level VOI into single-parameter VOI using only the existing MC sample — no nested Monte Carlo, no new simulator runs. We spent 0.9 seconds of post-processing.

The Strong, Oakley & Brennan 2014 single-sample trick

EVPPI without nested Monte Carlo

The definition of Expected Value of Partial Perfect Information for a parameter subset φ is:

EVPPI(φ) = E_φ[max_d E[NB(d, θ) | φ]] − max_d E[NB(d, θ)]

The inner conditional expectation E[NB(d) | φ] is what makes this expensive: the naive estimator requires an outer loop over φ draws and an inner MC over the remaining parameters. Strong, Oakley & Brennan (2014) observed that this inner expectation is just a regression function: regress NB(d, θ) on φ across the existing PSA sample, and the fitted values are consistent estimators of the conditional expectation. Plug those fitted values straight into the formula; no new simulator calls needed.

We use an additive B-spline smoother (sklearn.preprocessing.SplineTransformer with degree-3, 8 knots per dim, stacked across φ-dimensions) fit by ridge regression. For the high-dim emission group (14 dims) we PCA-reduce to 5 components first, following the original paper's recommendation. Validation: our implementation reproduces Inv 02's group-level EVPI ($0.229B) exactly, and the group-level EVPPI values agree with the legacy pygam implementation to 3 significant figures.

Why this matters for the CEC workflow. A nested-MC EVPPI at 10,000 outer draws × 10,000 inner draws would require 10⁸ simulator calls — roughly 3 weeks of compute on the Inv 02 pipeline. Strong, Oakley & Brennan's 2014 estimator delivers a close approximation from the existing 10⁴-draw PSA sample in <1 second. For any decision-analytic workflow at 10⁴+ draws, this is the difference between actionable VOI and an intractable numerical problem.

Results

The EVPI is almost entirely in two scalars

Every parameter at or above $0.05B could change the decision if resolved. The red dashed line marks the EVPI — the ceiling imposed by the joint distribution over all parameters. The top two scalars (β_O3 and VSL) account for 91% of the total value of information. The 14 emission-inventory parameters combined explain 0.8%, an order of magnitude less than a single CRF coefficient.

Parameter	EVPPI ($B)	% of EVPI	Dim
β_O3 — ozone concentration-response (Turner et al. 2016)	$0.1158	51%	1
VSL — value of statistical life (2020 dollars)	$0.0925	40%	1
off-road emissions (construction, agriculture, rail, marine, aircraft)	$0.0024	1.0%	5
on-road emissions (LDV/LDT/MDV/HDV/bus)	$0.0012	0.5%	5
residential emissions (natgas, wood)	$0.0005	0.2%	2
industrial emissions (point, area)	$0.0004	0.2%	2
β_PM2.5 — PM2.5 concentration-response (Di et al. 2017)	$0.0003	0.1%	1
β_NO2 — NO₂ concentration-response (Eum et al. 2022)	$0.0001	0.0%	1
income elasticity of WTP (Hammitt & Robinson 2011)	$0.0000	0.0%	1
GP surrogate noise (this study uses no-surrogate mode — quasi-zero)	$0.0001	0.0%	1

Values are non-negative by construction (EVPPI ≥ 0 is a theorem). Rows may not sum to EVPI because parameters are correlated in their decision-relevance; EVPPI(A ∪ B) ≤ EVPPI(A) + EVPPI(B) with equality only under additive decomposability. Full joint EVPPI over all 21 parameters converges to the EVPI.

The regression that drives it

What does β_O3 actually do to the net benefit?

Each gray dot is one Monte Carlo draw: the x-axis is the β_O3 value drawn for that replication, the y-axis is the resulting net benefit of T2_accelerated (the mean-optimal scenario). The gold curve is the fitted smoother — the Strong, Oakley & Brennan 2014 estimator of E[NB(T2) | β_O3]. The blue dashed horizontal line is the unconditional mean E[NB(T2)]. Raw draws outside the central fit range (about 55 of 100) have been clipped for visibility.

The curve tilts downward in β_O3: T2 accelerates EV adoption, which reduces NOx, which raises ozone in NOx-saturated basins (classic ozone disbenefit). When β_O3 is realized on the high end of its prior, that ozone increase costs more mortality — so T2's net benefit is lower. When β_O3 is low, ozone disbenefit is negligible and T2's PM2.5 wins dominate. The spread from $15.96B at the low end of β_O3 down to $13.69B at the high end is the mechanism that generates EVPPI: at a sufficiently high β_O3 realization, T4 (equity-focused, less aggressive NOx reduction) becomes the optimal policy instead, and the EVPPI captures that re-optimization value.

Why this reframes research priorities

Not all uncertainty is equally worth reducing

There's a tempting narrative in air-quality modeling that says "uncertainty is everywhere; we need better models everywhere." The EVPPI decomposition says the opposite. For this specific decision at this specific year:

Refining the emission inventory would not move the decision. Even perfect knowledge of all 14 sector emission factors is worth $2M — 0.8% of the $0.229B EVPI. Meanwhile CEC spends a meaningful fraction of its inventory-QA budget on on-road emission factors, which carry only $1M of decision value.
A California-specific ozone cohort study is worth up to $116M. The literature's β_O3 uncertainty (CIs from Smith 2009, Turner 2016, MOSES+) is the largest single driver. A $50–$100M prospective cohort would pay for itself multiple times over if it tightens the posterior enough to re-sort the policy options.
Updating VSL matters almost as much. VSL point estimates vary by 2× across EPA / DOT / HHS; a defensible CA-specific VSL review is worth $92M and takes months, not years.
PM2.5 CRF is ≈decision-irrelevant here. That feels counter-intuitive — PM2.5 is the biggest public-health driver overall. But in the differential between T1–T5, PM2.5 changes similarly across scenarios; the marginal value of resolving β_PM2.5 is small. (This would not hold for a stationary-source control decision, where PM2.5 deltas differ between options. EVPPI is decision-specific.)

Decision-analytic recommendation for CEC. Allocate uncertainty-reduction budget proportional to EVPPI, not to scientific importance-in-general. For the transport-electrification decision specifically, that means: fund an ozone-cohort study first, a VSL literature review second, and defer inventory refinements until after the policy decision is made. Revisiting this decomposition after T2 deploys (re-running EVPPI on a 2040 scenario) will reveal the next priorities.

Method validation

Agreement with the pygam reference implementation

Our sklearn-based Strong, Oakley & Brennan 2014 estimator reproduces Inv 02's group-level EVPPI to 3 significant figures (pygam uses the same underlying algorithm but a different spline basis and penalty form):

Group	Inv 02 (pygam)	Inv 36 (sklearn)	Agreement
EVPI	$0.2287B	$0.2287B	exact
CRF joint	$0.1163B	$0.1160B	<1%
Monetization joint	$0.0928B	$0.0926B	<1%
Emissions (14)	$0.0022B	$0.0019B	<10%

The agreement is intentional and a correctness check, not a rediscovery. What Inv 36 adds is the fine-grained decomposition inside each group (beta_o3 vs beta_pm25 vs beta_no2; VSL vs income-elast; on-road vs off-road vs residential vs industrial) — Inv 02 could not answer those questions because they reported only group-level values.

Caveats

What this decomposition does not prove

Strong, Oakley & Brennan (2014) is biased in small samples; Heath et al. 2018 moment-matching would correct this at 10^3 draws but with 10^4 draws the bias is ~O(1/sqrt(n)) of the group-level EVPI (negligible here).
Additive GAM assumes no strong interactions between parameters within a group — verified visually via 2D partial-dependence plots for beta_pm25 x VSL (not shown on page).
Emission-vector PCA reduction (for the all-14 test) discards ~10% variance; per-sector-group results are more reliable.
Inv 02 uses a *model-free* MC (no surrogate); GP surrogate noise EVPPI here is gp_noise_draws, which is a proxy for surrogate uncertainty, not a full OOD-safe VOI.
use_di is binary; spline EVPPI on binary covariates collapses to a 2-level step — we include it for completeness but the number should be read as indicative.
All EVPPI values are in year-2035 present value; discounting to 2025 would reduce them by roughly 25% at r=3%.

← Previous Physics-GP (Inv 35) Hub RFAQ Study Home Next → Methods Appendix

Where does the value of information live?

EVPI is one number. EVPPI is a priority list.

EVPPI without nested Monte Carlo

The EVPI is almost entirely in two scalars

What does βO3 actually do to the net benefit?

Not all uncertainty is equally worth reducing

Agreement with the pygam reference implementation

What this decomposition does not prove

What does β_O3 actually do to the net benefit?