Weight the Model and Monitors by Their Uncertainty
The ISRM model output acts as a prior; the kriged monitor field is the likelihood. Combine them and each source’s weight is its inverse variance — confident sources pull harder:
σ²post = 1 / (1/σ²model + 1/σ²krig)
When the model is accurate, fusion produces a tighter estimate than either source alone. When the model is biased, the fusion posterior is pulled toward the wrong value — and a simpler method (kriging alone) can win.
Why is the model 4× too high? The ISRM model computes total PM2.5 from all emission sectors including wildfire. Wildfire emissions represent multi-year averages from the NEI. The 2023 AQS monitors measured a clean-air year (56K acres burned vs 1.3M average). The model isn’t wrong — it’s answering a different question (long-run average vs this year’s actual).
Hold Out One, Predict It, Repeat
At each of the 112 AQS monitor locations, drop that site and predict its 2023 annual mean from the remaining 111. Compare the prediction to truth, repeat for all sites, then score three approaches side by side.
The model alone is catastrophic (RMSE 34.8, worse than predicting the mean). Fusion improves on the model by 87.5%, but kriging alone still beats fusion by a factor of two. The biased model prior drags the fusion posterior away from truth.
Fusion reduces model RMSE by 87.5% (34.8 → 4.4). But kriging RMSE is 2.07 — fusion RMSE is 2.1× kriging. When the prior is accurate, fusion wins. When it’s 4× biased, kriging wins.
Where the Bias Lives
| Region | Model PM2.5 | Kriging PM2.5 | Fusion PM2.5 | Model–Krig Gap | Uncertainty ↓ |
|---|---|---|---|---|---|
| LA Basin | 60.5 | 9.2 | 12.1 | −48.4 | 62% |
| Sacramento | 25.5 | 7.9 | 10.2 | −15.3 | 48% |
| Bay Area | 20.7 | 6.8 | 8.0 | −12.6 | 33% |
| SJV | 20.9 | 9.9 | 9.6 | −11.3 | 29% |
| Rest of CA | 14.2 | 8.0 | 8.3 | −5.9 | 22% |
LA Basin has the largest model-observation gap (−48.4 µg/m³) but the largest fusion uncertainty reduction (62%). The prior still constrains posterior variance even when it is biased.
Model Bias Changes the Death Count
The choice of PM2.5 field propagates directly to health burden estimates:
| PM2.5 Source | Attributable Deaths | vs Fusion |
|---|---|---|
| ISRM Model Only | 6,681 | +4,535 (+211%) |
| Bayesian Fusion | 2,146 | — |
| Kriging Only | 1,824 | −322 (−15%) |
The model-only field produces 6,681 attributable deaths — 3.1× the fusion estimate and 3.7× the kriging estimate. The 4,535-death gap between model and fusion is entirely driven by the wildfire-averaging bias. Getting the PM2.5 field right is not a statistical nicety — it determines whether you estimate 1,800 or 6,700 deaths.
How Much Does Each Source Contribute?
Decomposing the fusion posterior uncertainty into contributions from each data source:
AQS monitors provide 70% of marginal variance reduction; the model provides 38%. These overlap (sum >100%) because each source’s contribution is measured independently — removing either degrades the posterior, but their information partially overlaps in well-monitored areas.
The fidelity lesson: Bayesian fusion is mathematically optimal when the prior is unbiased. When the prior has a 4× systematic error, the sophisticated method loses to the simple one. Adding model complexity without first validating the model degrades the answer. Check the prior before you fuse.
ISRM total PM2.5 (all sectors) · 112 AQS FRM/FEM annual mean 2023 · Ordinary kriging (nugget=0, sill=7.9, range=149 km) · Gaussian Bayesian update · LOO cross-validation at all 112 sites · Variogram fit by weighted least squares