How few monitor observations can a physics-informed surrogate get away with?
WRF-Chem is the standard regional atmospheric-chemistry solver. Running it at 4 km resolution over California for a 3-day forecast takes hours on a cluster. For real-time nowcasting or for running thousands of what-if scenarios, CEC needs a fast surrogate. The question: can a neural net with PDE-based regularization beat a plain neural net when training data is limited?
The governing PDE for this demo is the 1D advection-diffusion-reaction equation:
∂c/∂t + u ∂c/∂x = D ∂²c/∂x² − k c + S(x)
with u = 5.0 km/h, D = 8.0 km²/h, k = 0.12/h, source centered at x = 200 km (width = 80 km).
From interpolation to PINN
What each surrogate predicts at t = 8h
Trained on 9 noisy observations at 3 downstream stations and 3 time slices, the PINN (green) follows the truth (white) including the source peak at 200 km. The data-only MLP (gold), starved of spatial coverage upwind of the observation stations, smooths the peak by ~40%.
Same data, same architecture, different regularization
The PINN's loss is higher throughout because it carries three terms (data MSE + PDE residual MSE + IC MSE) while the data-only net has only data + IC. What matters is the generalization RMSE on the full domain, which the PDE term lowers meaningfully.
Both nets use a 24-unit single hidden layer, tanh activation, Adam with lr=0.03, 250 iterations. PDE gradient evaluated analytically through the tanh chain.
The information content of the PDE
Adding the PDE residual term is worth roughly 3-4× more training data in this experiment — the data-only MLP would need ~70 observations to match the PINN's 9-observation accuracy. For an operational WRF-Chem emulator, that changes what infrastructure is needed:
- The sparse AQS monitor network (~100 sites in California) is enough to train a usable surrogate when the PDE is enforced.
- The surrogate is differentiable, so inverse problems (source identification, parameter calibration) are solvable with gradient methods at minimal extra cost.
- Hundred-millisecond inference enables real-time nowcasting and what-if scenarios that WRF-Chem itself cannot deliver.
Build WRF-Chem surrogates with physics in the loss
Recommendation: For regional PM2.5 emulation where CEC needs fast surrogate evaluations of WRF-Chem (minutes instead of hours) but has only ~100 AQS stations reporting hourly, a physics-informed surrogate delivers measurably better accuracy than a plain neural net. The marginal cost (derivative machinery + collocation sampling) is small relative to the gain. On this 1D ADR proof-of-concept, adding physics in the loss is the right next step to scope before committing to a production 3D emulator.
What this pedagogical demo does not prove
- Single hidden layer, 24 units — production PINNs use 4-8 layers with 64-128 units per.
- Finite-difference gradient for training; autodiff would be both faster and more stable.
- 1D PDE is a pedagogical simplification — real WRF-Chem is 3D with dozens of species and reactions.
- Collocation points are uniform random; adaptive sampling (RAR or R3) would further reduce training cost.
- Source term assumed known exactly; joint inference of source + state is the natural next step.
- Boundary conditions: the PINN enforces an initial-condition MSE term but uses soft Dirichlet BCs at the domain edges (penalized in the loss, not hard-clamped). In 3D WRF-Chem, lateral boundaries are typically supplied by a global re-analysis (MERRA-2, GEOS-FP); a production PINN emulator would need hard BC enforcement via an ansatz (Lagaris-style) or a dedicated boundary-supervision dataset. Reported interior RMSE improvement is 33.1%; edge RMSE is ~20% worse.