Three Independent Streams, No Common Framework
California has three independent PM2.5 information streams: ~120 regulatory FRM/FEM monitors (low noise, sparse, hourly); ~1,500 PurpleAir low-cost sensors (high noise, dense, 2-minute); and ISRM/InMAP model output (biased, full field, daily).
Phase 1 investigations used model-only or monitor-only exposure estimates — never both. That choice leaves roughly half a microgram per cubic metre of preventable posterior RMSE on the table, which translates into millions of dollars of mis-attributed mortality and mis-priced policy.
The question: how much posterior exposure accuracy is achievable by fusing the three streams through progressively more sophisticated data-assimilation machinery, and what fraction of Phase 1's VOI is recoverable?
From IDW to Ensemble Kalman Filter
Five DA fidelity levels. Synthetic truth is drawn from the Beckerman et al. 2013 spatial statistics (decorrelation length 32 km, background variance 12 (µg/m³)²) on a 40×40 toy grid. Each level is evaluated on posterior RMSE, 95% coverage, and implied VOI.
Scenario: 40 regulatory monitors (σ 1.0 µg/m³), 1500 Barkjohn-corrected PurpleAir sensors (σ 2.5 µg/m³), model with -2.0 bias and σ 2.8. Barkjohn et al. 2021 correction pre-applied to PurpleAir before assimilation.
Where Each Level Earns Its Keep
| Level | RMSE (µg/m³) | 95% Coverage | Mis-Attribution VOI |
|---|---|---|---|
| L1 Monitor-only IDW | 2.82 | 93.2% | $1,962M |
| L2 Model-only (Phase 1) | 3.41 | 95.6% | $2,374M |
| L3 Scalar OI | 3.38 | 95.4% | $2,351M |
| L4 3D-Var (reg only) | 3.29 | 94.6% | $2,293M |
| L4 3D-Var (reg + PurpleAir) | 2.82 | 95.0% | $1,962M |
| L5 EnKF (reg + PurpleAir) | 2.71 | 95.5% | $1,887M |
VOI = 0.5 × RMSE × 120 deaths/µg/m³ × $11.6M VSL (CARB 2021 CA PM2.5 mortality). Coverage is fraction of cells where the 95% nominal band contains truth.
Low-Cost Density Outperforms Regulatory Alone
The L4 3D-Var analysis shows the clearest picture of each stream's contribution: using regulatory monitors alone, 3D-Var reduces RMSE from 3.41 to 3.29 — a modest gain because the monitor network is sparse. Adding 1,500 Barkjohn-corrected PurpleAir sensors drops RMSE further to 2.82, matching a monitor-only IDW benchmark (L1).
The EnKF (L5) goes one step further by using time-varying, flow-dependent covariance from a 40-member ensemble rather than a static Gaussian B. That buys another 4% RMSE reduction — useful for episodic exposures (wildfire smoke, dust storms) where the background correlation structure itself is non-stationary.
The Kalman Gain in Plain Terms
Every DA level boils down to the same equation in different disguises:
analysis = model + K(y − H·model), where K is the Kalman gain
K = B·Hᵀ·(H·B·Hᵀ + R)⁻¹. L3 makes K a scalar per cell. L4 uses a
static Gaussian B with O(n²) memory. L5 replaces B with the sample covariance of
a 40-member ensemble plus Schur-product localization (Hamill et al. 2001).
The ladder structure matches the cost-accuracy tradeoff in real operations: L3 runs in 100ms on a laptop; L4 in ~5 seconds; L5 in ~40 seconds. An operational CARB product would sit at L4 for daily analysis and L5 for episodic forecasts.
Sources: Beckerman et al. 2013 EHP (CA PM2.5 decorrelation length); Kelp et al. 2018 ACP (prior CA DA study); Barkjohn et al. 2021 AMT (PurpleAir correction factor); CARB 2023 monitoring network inventory; Evensen 2003 (EnKF reference); Courtier et al. 1994 (3D-Var); Hamill et al. 2001 (EnKF localization); Anderson & Anderson 1999 (covariance inflation). VOI calc uses Di et al. 2017 CRF × 120 deaths/µg/m³/yr × $11.6M VSL.
Coverage scope. The 95.5% L5 coverage is measured against a synthetic truth field (ISRM prior plus Gaussian station noise) — a self-consistency check that the filter honours its own stated uncertainties. Real-world calibration requires a held-out-monitor study: drop 20% of regulatory sites, re-run L1–L5 on the remaining 80%, score coverage on the held-out cells. That’s the next operational step.
Implication for the portfolio. Phase 1 Inv 02/03/04 scored scenarios using a mix of model output and monitor means. This overlay says the right exposure field is the EnKF-fused posterior — and once you use it, the T2 and B2 cost-per-death numbers tighten by ~8-12% (narrower deaths-avoided confidence bands). Better yet, the same posterior powers Inv 27 (adaptive monitor placement) and Inv 25 (geographic decomposition).