How Much Is Better Site Characterization Worth?
A $500K Kd-narrowing campaign moves the P50 arrival time from 53 to 59 years and changes the P95 remediation cost by $0.2M. VOI ≈ zero. Sometimes the honest answer is: don’t spend the money — the decision is already made.
What Is Kd (Distribution Coefficient)?
Kd measures how much PFAS sticks to soil versus staying dissolved in water. A higher Kd means more contaminant sorbs to the solid phase — the plume moves slower, but cleanup takes longer because you have to flush more pore volumes to desorb the chemical.
For PFOS, published Kd values range from 0.5 to 20 L/kg depending on soil organic carbon content, mineralogy, and pH — a 40x range that dominates cleanup time predictions. At Kd = 0.5, the plume moves nearly as fast as groundwater. At Kd = 20, it creeps. The retardation factor R = 1 + (ρb/n) · Kd controls everything downstream: arrival time, pump duration, remediation cost.
The problem: most site investigations measure Kd from 3–5 soil samples. That's not enough to constrain a parameter with a 40x range. Every dollar of uncertainty in Kd propagates into millions of dollars of uncertainty in cleanup cost.
Sharper Kd Doesn’t Move the Decision
We compared two scenarios on a 50-realization Monte Carlo ensemble: wide Kd uncertainty (typical 3–5 sample investigation, log_std = 0.8) versus a targeted sorption campaign that narrows the prior (~$500K, log_std = 0.3). Then we recomputed the arrival distribution and the P95 P&T cost under each.
The CDFs barely separate. Wide Kd: P5 = 23 yr, P50 = 53 yr, P95 = 76.6 yr. Narrow Kd: P5 = 36.5 yr, P50 = 59 yr, P95 = 77.5 yr. Both distributions show ~40% of realizations where the contaminant never reaches the well within the simulation horizon. Narrowing Kd shifts the median by 6 years and tightens the lower tail — but the P95 (the number that drives design) moves less than 1 year.
The cost calculation tells the same story. P95 remediation NPV under wide Kd: $89.8M. Under narrow Kd: $90.0M. The remediation cost barely moves (+$0.2M); add the $0.5M campaign and you’re $0.7M worse off for the same decision.
50-realization Monte Carlo, 80-year horizon. Wide: Kd log_std = 0.8. Narrow: Kd log_std = 0.3. Cost basis: $30M P&T capex + NPV of $2M/yr opex over 100 years at 3%.
Why Kd Doesn’t Move It
Decomposing the Monte Carlo variance shows where the spread actually comes from:
K (hydraulic conductivity) alone accounts for 51% of arrival-time variance. Kd contributes 30%. Hydraulic gradient adds 18%; porosity is negligible at 2%. Kd matters — but it’s not the dominant lever. And under containment with three wells, even halving the Kd spread doesn’t change capture-zone design or the 100-year operating profile, because every realization in the ensemble is captured.
The decision-relevant question for a different site might be: would tighter K reshape the design? For this site, the capture-zone capacity is set by the wells, not by the parameter — so even the dominant variance contributor doesn’t flip the choice. That’s the whole VOI story.
The Honest Answer: Don’t Spend the $500K
VOI ≈ $0M for this site, this decision. The remediation choice — three extraction wells, run for the planning horizon — is robust to Kd across the full prior. Spending $500K on a sharper Kd would buy a tighter histogram and the same build. That’s the difference between information and insight.
This is a value-of-information analysis with a counterintuitive result. The question isn’t “what’s the answer?” — it’s “how much would a better answer be worth?” For a site already committed to long-horizon containment, sharper Kd is worth roughly what you paid for the extra wells you didn’t need. Only a Monte Carlo framework can make this call — deterministic models can’t quantify the value of reducing uncertainty because they don’t represent it.
VOI flips on different sites. If the choice were monitored natural attenuation vs. active remediation, Kd would matter enormously — it controls whether the plume retreats on its own. If the well were closer (decade-scale arrival rather than half-century), Kd would shift the design horizon. Always ask: which decision does this measurement actually change? If the answer is “none,” the right move is to skip the campaign and put the money toward operating the wells.
50-realization Monte Carlo, 80-year horizon. Wide: Kd log_std = 0.8. Narrow: Kd log_std = 0.3. Reported VOI is project NPV under narrow minus project NPV under wide, minus the $0.5M campaign cost.
Can Machine Learning Replace Measurement?
If Kd is the dominant uncertainty, and site characterization costs $500K, an obvious question arises: can we predict Kd from cheaper-to-measure soil properties instead? Soil organic carbon content, pH, clay percentage, and grain size are routinely measured during any environmental site assessment. If these properties could reliably predict Kd, you could skip the expensive sorption testing entirely.
We tested this idea using the largest available dataset of PFAS sorption measurements: 1,227 laboratory experiments compiled from 47 published studies, covering 47 PFAS compounds across 451 different soil types (Kühne et al. 2025, Environ. Sci. Technol.). Each experiment measured how much PFAS sorbed to a soil sample under controlled conditions, along with the soil’s organic carbon content, pH, clay/silt/sand fractions, and cation exchange capacity.
We trained three models at increasing sophistication — mirroring the fidelity progression used throughout this study.
Three Models, One Test
Organic Carbon Partitioning
The standard textbook model: Kd = Koc × foc, where Koc is how strongly the chemical partitions to organic carbon (known from its molecular structure) and foc is the fraction of organic carbon in the soil (measurable for ~$50/sample). We added published corrections for pH and clay content. No data fitting — pure chemistry.
Random Forest
A random forest is an ensemble of 200 decision trees, each trained on a random subset of the data. It learns patterns from ALL available features — molecular weight, fluorine count, chain length, pH, organic carbon, clay content, and cation exchange capacity — without any physics constraints. The model finds whatever statistical relationships maximize prediction accuracy.
Physics Backbone + Learned Correction
The physics model provides the baseline prediction. A second machine learning model (gradient boosting) is trained not on Kd directly, but on the error in the physics prediction — learning to correct the systematic biases that pure chemistry misses. The final prediction is: physics estimate + learned correction.
Reading the cards: R² measures prediction accuracy on held-out data — 1.0 is perfect, 0.0 is no better than guessing. We used 10-fold cross-validation: train on 90% of the data, test on the remaining 10%, rotate 10 times, average the scores. The ML models score 0.83–0.84, which looks strong. But that score is computed within the training distribution. The real question is what happens outside it.
The Cape Cod Test
Joint Base Cape Cod is our validation site. USGS measured PFOS concentrations at 62 monitoring wells in 2020. From the observed plume extent (2,700 meters in 55 years), we back-calculated the effective field Kd: 0.39 L/kg — far below the literature default of 1.5 L/kg (Anderson et al. 2019). Cape Cod’s glacial outwash sand has very little organic carbon (0.3%) and almost no clay (5%) — an extreme soil that sits at the far edge of the training data.
We gave each model Cape Cod’s soil properties and asked: what is the PFOS Kd?
| Model | Predicted Kd | Error vs. Field |
|---|---|---|
| Literature default (Anderson 2019) | 1.50 L/kg | 285% |
| Pure physics (Koc × foc) | 4.00 L/kg | 926% |
| Physics-informed ML | 7.02 L/kg | 1,700% |
| Pure ML (Random Forest) | 12.67 L/kg | 3,148% |
| Actual (field, USGS 2020) | 0.39 L/kg | — |
Every model overshoots by at least an order of magnitude. The physics-informed model does beat pure ML — cutting the error roughly in half — but “half of terrible” is still terrible. The simple literature default of 1.5 L/kg, despite being “wrong,” outperforms all three ML approaches.
Why R² = 0.84 Can Be 1,700% Wrong
This is the most important lesson in the entire experiment. An R² of 0.84 means the model explains 84% of the variation within the training data. That data is overwhelmingly from moderate soils — the median organic carbon in the dataset is 1.3%, the median clay content is 24%. Within that range, the model interpolates well.
Cape Cod sits far outside that range: 0.3% organic carbon, 5% clay. The model has almost no training examples from soils this extreme. When it extrapolates — predicting outside the range of data it learned from — it fails catastrophically. This is the fundamental limitation of data-driven models: they learn patterns in the data they’ve seen, and those patterns don’t necessarily hold in new territory.
The Lab-to-Field Gap
Even if the training data included more sandy, low-carbon soils, a deeper problem remains. The 1,227 measurements are laboratory batch sorption experiments — a researcher takes a soil sample, crushes and sieves it, shakes it with PFAS-contaminated water, and measures how much PFAS sticks to the soil particles. This is a controlled, small-scale measurement.
In the field, PFAS transport through intact geological structure involves processes that crushed-soil experiments cannot capture:
- Preferential flow — water (and contaminant) channels through high-permeability paths, bypassing most of the soil matrix
- Air-water interface sorption — in unsaturated sands, PFAS accumulates at air-water boundaries that don’t exist in a shaken flask (Brusseau 2018)
- Scale effects — a 10-gram lab sample cannot represent the heterogeneity of a 2,700-meter plume
- Non-equilibrium transport — batch experiments assume the PFAS reaches sorption equilibrium; in flowing groundwater, it may not
The field Kd of 0.39 L/kg at Cape Cod is an effective value that integrates all of these real-world mechanisms. It is not the same quantity that lab experiments measure, even though both are called “Kd.”
For practitioners: If a vendor offers an “AI-powered Kd prediction” trained on published sorption data, ask them how it performs on out-of-distribution soils. A model that scores R² = 0.84 on its own test set can be off by 30x on your site. The $500K you spend on field characterization is not just buying a better number — it’s buying a number that actually describes your aquifer.
Data: Kühne et al. (2025), “Modeling PFAS Sorption in Soils Using Machine Learning,” Environ. Sci. Technol., doi:10.1021/acs.est.4c13284. Dataset: 1,227 Kd entries, 47 PFAS, 451 soils, 47 source studies (supplementary file es4c13284_si_002.xlsx). Models: scikit-learn RandomForestRegressor (200 trees), GradientBoostingRegressor (200 trees, lr=0.05, max_depth=4). Physics backbone: log Kd = log Koc(CF2) + log foc − 0.08(pH − 6) + 0.005(clay%). 10-fold cross-validation, shuffle, seed=42. Cape Cod soil: Sand=85%, Silt=10%, Clay=5%, Corg=0.3%, pH=6.0, CEC=2.0 cmol+/kg (Walter et al. 2018, USGS SIR 2018-5139). Field Kd: 0.39 L/kg, back-calculated from observed plume extent at 62 USGS monitoring wells (Water Quality Portal, 2020 sampling campaign, 49 PFOS detections, 1.3–610 ng/L).