Can We Fuse Model and Monitor Data?

How fusion works

Weight the Model and Monitors by Their Uncertainty

The ISRM model output acts as a prior; the kriged monitor field is the likelihood. Combine them and each source’s weight is its inverse variance — confident sources pull harder:

σ²_post = 1 / (1/σ²_model + 1/σ²_krig)

When the model is accurate, fusion produces a tighter estimate than either source alone. When the model is biased, the fusion posterior is pulled toward the wrong value — and a simpler method (kriging alone) can win.

Why is the model 4× too high? The ISRM model computes total PM2.5 from all emission sectors including wildfire. Wildfire emissions represent multi-year averages from the NEI. The 2023 AQS monitors measured a clean-air year (56K acres burned vs 1.3M average). The model isn’t wrong — it’s answering a different question (long-run average vs this year’s actual).

Head-to-head at 112 monitors

Hold Out One, Predict It, Repeat

At each of the 112 AQS monitor locations, drop that site and predict its 2023 annual mean from the remaining 111. Compare the prediction to truth, repeat for all sites, then score three approaches side by side.

ISRM Model Only

34.8

RMSE (µg/m³) · R² = −166

Worse than guessing the mean

Kriging Only
2.07
RMSE (µg/m³) · R² = 0.41
Best performer

Bayesian Fusion

4.36

RMSE (µg/m³) · R² = −1.6

Biased prior hurts

The model alone is catastrophic (RMSE 34.8, worse than predicting the mean). Fusion improves on the model by 87.5%, but kriging alone still beats fusion by a factor of two. The biased model prior drags the fusion posterior away from truth.

Fusion reduces model RMSE by 87.5% (34.8 → 4.4). But kriging RMSE is 2.07 — fusion RMSE is 2.1× kriging. When the prior is accurate, fusion wins. When it’s 4× biased, kriging wins.

Regional Detail

Where the Bias Lives

Region	Model PM2.5	Kriging PM2.5	Fusion PM2.5	Model–Krig Gap	Uncertainty ↓
LA Basin	60.5	9.2	12.1	−48.4	62%
Sacramento	25.5	7.9	10.2	−15.3	48%
Bay Area	20.7	6.8	8.0	−12.6	33%
SJV	20.9	9.9	9.6	−11.3	29%
Rest of CA	14.2	8.0	8.3	−5.9	22%

LA Basin has the largest model-observation gap (−48.4 µg/m³) but the largest fusion uncertainty reduction (62%). The prior still constrains posterior variance even when it is biased.

Health Consequences

Model Bias Changes the Death Count

The choice of PM2.5 field propagates directly to health burden estimates:

PM2.5 Source	Attributable Deaths	vs Fusion
ISRM Model Only	6,681	+4,535 (+211%)
Bayesian Fusion	2,146	—
Kriging Only	1,824	−322 (−15%)

The model-only field produces 6,681 attributable deaths — 3.1× the fusion estimate and 3.7× the kriging estimate. The 4,535-death gap between model and fusion is entirely driven by the wildfire-averaging bias. Getting the PM2.5 field right is not a statistical nicety — it determines whether you estimate 1,800 or 6,700 deaths.

Data Source Value

How Much Does Each Source Contribute?

Decomposing the fusion posterior uncertainty into contributions from each data source:

AQS Monitor Value
70%
Monitors drive most of the uncertainty reduction

Model Value

38%

Model contributes despite 4× bias

Fusion σ

1.01

vs model-only σ = 3.38 and kriging σ = 1.62

AQS monitors provide 70% of marginal variance reduction; the model provides 38%. These overlap (sum >100%) because each source’s contribution is measured independently — removing either degrades the posterior, but their information partially overlaps in well-monitored areas.

Finding

The ISRM model predicts 33.8 µg/m³; AQS monitors measure 7.9. The model is 4× too high because it includes multi-year wildfire averages while 2023 was a clean-air year. Bayesian fusion reduces uncertainty by 55%, but kriging alone (RMSE 2.07) beats fusion (RMSE 4.36) when the model prior is this biased. Kriging alone wins.

The fidelity lesson: Bayesian fusion is mathematically optimal when the prior is unbiased. When the prior has a 4× systematic error, the sophisticated method loses to the simple one. Adding model complexity without first validating the model degrades the answer. Check the prior before you fuse.

ISRM total PM2.5 (all sectors) · 112 AQS FRM/FEM annual mean 2023 · Ordinary kriging (nugget=0, sill=7.9, range=149 km) · Gaussian Bayesian update · LOO cross-validation at all 112 sites · Variogram fit by weighted least squares

← Previous Smoke Climatology Hub RFAQ Study Home Next → Phase 2 · Wildfire Ladder