The Question
Investigation 13 shows that mortality can be predicted from health data alone. But people aren’t just their diagnoses — they’re also their bank accounts, their education, their marital status. The social determinants of health (SDOH) literature is clear: income predicts longevity. The question is whether adding income, education, and marital status to a model that already knows your measured biomarkers — BP, lab cholesterol, HbA1c, CRP, hemoglobin, serum creatinine — actually improves prediction. NHANES is a stronger mediation test than HRS because it has laboratory measurements, not just self-reported conditions.
ADM Prediction (Made Before Running Models)
Predicted winner: Health+SDOH ML, but modest gain. Chetty et al. (2016) showed a 10–15 year life expectancy gap between top and bottom income quintiles. But most of this gap works through health behaviors and conditions that measured biomarkers already capture. The interesting question isn’t whether wealth matters — it’s whether wealth adds information beyond what measured health captures.
Prediction confirmed — and then some. Income, education, and marital status add only +0.002 AUC (0.908 → 0.910) with overlapping CIs. Formal mediation: the SDOH-only model reaches AUC 0.650; the health-only model reaches 0.908. Of the 0.150 total SDOH effect, 98.3% is mediated through measured health. Wealth predicts mortality almost entirely because poor people are sicker — in ways you can measure in a lab.
Results
ROC Curves (Three Models)
Feature Importance (Top 8, Health+Bio+SDOH)
Income Quintile → Mortality Rate
Mediation: Where Does Wealth’s Effect Go?
The NHANES Advantage
The HRS version of this investigation could only rule out self-reported conditions as the mediator. NHANES is the stronger test because it has measured biomarkers: up to four averaged blood pressure readings, lab-verified cholesterol, HbA1c, CRP, hemoglobin, serum creatinine, albumin. The mediation finding here means: even when the model knows the lab values that income ought to correlate with, income adds nothing. The income-mortality pathway runs through biology that’s already in the blood panel.
The gradient itself is stark — the poorest income quintile has 28.0% ten-year mortality, the highest quintile 12.3% (relative risk 2.28×, 15.7 percentage-point spread). This isn’t a null result about poverty — it’s a finding about the causal pathway. Poverty kills people by making them sicker, and that sickness is measurable.
The ADM Insight
This is the investigation where more data doesn’t help. Income, education, and marital status together add +0.002 AUC to a model that already has measured biomarkers — statistically indistinguishable. 98% of the SDOH signal is mediated through health variables that medicine already measures. The decision-relevant answer: if you have a blood panel, you don’t need a wealth survey. The lesson isn’t that wealth doesn’t matter — it’s that by the time mortality is close enough to predict, the biological damage poverty causes has already shown up in the labs.
Data source: CDC NHANES 1999–2018 (ten two-year cycles pooled) linked to NCHS Linked Mortality Files (NDI match through December 2019). Mortality-eligible respondents only (ELIGSTAT=1).
Cohort: Adults aged 25–90 with at least 10 years of mortality follow-up or an observed death within 10 years. Final analytic sample: 28,636 adults, 6,626 deaths (23.1%).
Three models compared: (1) Domain: Charlson-style log-relative-risk score with Gompertz age component — conditions, smoking, BMI only. (2) Health + Biomarkers ML: GradientBoosting on chronic conditions, smoking, BMI plus measured BP, lipids, HbA1c, CRP, hemoglobin, serum creatinine, albumin. (3) Health + Bio + SDOH ML: same architecture plus poverty income ratio quintile (INDFMPIR), education (DMDEDUC2), and marital status (DMDMARTL).
Mediation test: Fit an SDOH-only model to establish the total effect (AUC 0.650 above chance). Then compare Health-only vs. Health+SDOH to isolate the direct effect. Indirect (mediated) effect = total − direct.
Evaluation: 5-fold stratified cross-validation. Bootstrap 95% CIs from 1,000 resamples. Survey weights applied for nationally representative gradient estimates (NHANES 2-year weights divided by number of pooled cycles), but equal weighting for ML training.
Cross-sectional SDOH measurement. Poverty income ratio is measured at one interview. Lifetime income trajectory (childhood poverty, income volatility, retirement drop) may matter more than a snapshot. The mediation finding applies to current income → current biology.
SDOH panel is narrow. Only PIR, education, and marital status. Adding neighborhood deprivation, occupation, or discrimination exposure could change the picture — those variables are known to have effects not fully captured by individual income.
Pooled cycles span two decades. Policy changes over 1999–2018 (ACA, SNAP expansions, etc.) may have shifted the SDOH-health relationship. The pooled estimate smooths across a period when the income-mortality gap widened.
Mediation is a decomposition, not a causal claim. The 98% mediated result means the SDOH signal can be explained by measured health. It does not prove poverty causes the health problems — reverse causation (illness causing poverty) is partially baked in.
Reverse causation. Serious illness causes medical bankruptcy, job loss, and divorce. A cross-sectional design cannot separate "poverty made you sick" from "sickness made you poor."