Longevity Study → Investigation 7

Does This Person Have Heart Disease?

38,033 adults across seven NHANES cycles (2005–2018). Framingham Risk Score vs GradientBoosting on the same cohort plus measured biomarkers — a clear ML win with non-overlapping confidence intervals and a 2,000-permutation p-value indistinguishable from zero.

The Question

Cardiovascular disease risk prediction has been the gold standard of clinical scoring since the Framingham Heart Study. We implement the 2008 Framingham Risk Score (D’Agostino et al., Circulation) as the domain baseline on NHANES 2005–2018 and ask whether GradientBoosting on the same cohort — with access to measured biomarkers like CRP, HbA1c, multi-reading averaged blood pressure, triglycerides, and waist circumference — can add predictive value beyond classic Framingham inputs. Outcome: self-reported cardiovascular disease (congestive heart failure, coronary heart disease, angina, heart attack, or stroke) from the NHANES Medical Conditions Questionnaire.

38,033
NHANES Adults (2005–2018)
0.843
ML AUC [0.838–0.849]
0.786
Framingham AUC [0.779–0.793]

ADM Prediction (Made Before Running Models)

Predicted winner: ML. CVD has well-established population-level risk factors (Framingham), but the Framingham Risk Score is limited to its seven inputs. NHANES provides measured biomarkers that Framingham cannot use directly — CRP for inflammation, HbA1c for glycemic dysregulation, waist circumference for central adiposity, multi-reading averaged blood pressure for reliable BP estimates. If these add real signal, ML should pick it up.

Expected: 5–10 AUC points advantage. Actual: +5.7 AUC points (0.843 vs 0.786, permutation p=0.000). Prediction confirmed — non-overlapping 95% CIs.

Results

ROC Curves

Feature Importance (Top 8)

Confidence Intervals

Net Reclassification

Where Does the ML Advantage Come From?

The Framingham Risk Score uses seven inputs (age, sex, total cholesterol, HDL, systolic BP, smoking, diabetes). NHANES provides eight additional measured biomarkers that Framingham does not score: diastolic BP, LDL, triglycerides, HbA1c, CRP, BMI, waist circumference, and total-to-HDL cholesterol ratio.

Extra biomarkers account for 24.1% of the GradientBoosting feature importance. The remaining 75.9% sits on the Framingham variables — but used with flexible, nonlinear weighting. The ML gain is a combination of two things: NHANES’s measured biomarkers adding real signal, and GradientBoosting capturing the nonlinear way those inputs combine.

Does Adding the Framingham Score to ML Help?

A hybrid model that feeds the Framingham Risk Score as an additional feature alongside all biomarkers tests whether published epidemiology adds information the ML model misses on its own.

Hybrid AUC: 0.843 [0.837–0.849] — statistically indistinguishable from ML alone (−0.001 vs ML). GradientBoosting on the raw NHANES biomarkers already captures everything the Framingham score encodes. The published score is useful as an interpretable baseline, not as an additional signal.

The ADM Insight

The Framingham Risk Score is the strongest possible domain baseline — it’s the clinical standard, built from decades of cardiology research. On NHANES it scores AUC 0.786 [0.779–0.793], which is solid. ML still wins by +5.7 AUC points (0.843 vs 0.786, non-overlapping CIs, 2,000-permutation p≈0). The advantage isn’t a strawman artifact: it comes from biomarkers Framingham cannot use (CRP, HbA1c, waist circumference) and from letting a nonlinear model combine them.

What doesn’t move the needle: adding the Framingham score to ML as an extra feature yields AUC 0.843 — identical to ML alone. Once the biomarkers are in the model, the published score is redundant. Published epidemiology encodes less than the raw measurements do. That’s the ADM reading: at this data fidelity, clinical scores are useful as interpretable baselines, not as signal sources.

Data: NHANES 2005–2018, seven two-year cycles pooled (survey weights divided by K=7 for prevalence; model evaluation is unweighted). 38,033 adults aged 20+; 4,193 with self-reported CVD (11.0% prevalence). Breakdown: heart attack 1,632, CHD 1,542, stroke 1,491, CHF 1,265, angina 982 (categories overlap).

The domain score is the 2008 Framingham Risk Score (D’Agostino et al., Circulation) as a continuous log-relative-risk using its seven inputs: age, sex, total cholesterol, HDL, systolic blood pressure, current smoking, diabetes.

The GradientBoosting model trains on the full NHANES panel: Framingham variables plus measured biomarkers (DBP, LDL, triglycerides, HbA1c, CRP, BMI, waist circumference, chol/HDL ratio) and several behavioral features.

Both evaluated on identical held-out folds via 5-fold stratified cross-validation. Bootstrap 95% CIs from 1,000 resamples. 2,000-permutation test for AUC difference yields p≈0 (statistically significant). Non-overlapping CIs confirm the gap isn’t chance.

Cross-sectional, not prospective: NHANES is cross-sectional — the “outcome” is current CVD status at interview, not future onset. A truly prospective CVD risk model requires a longitudinal cohort (ARIC, MESA, Framingham Offspring) with measured incident events.

Self-reported CVD: The outcome is self-reported heart attack / CHD / CHF / angina / stroke from the Medical Conditions Questionnaire, not adjudicated clinical events. Under-reporting (missed silent MI, undiagnosed CHF) will bias the outcome definition.

Pooled Cohort Equations not implemented: The 2013 ACC/AHA Pooled Cohort Equations are the current US clinical standard. We use the earlier 2008 Framingham continuous-score instead, which shares most inputs. A head-to-head vs PCE is a reasonable next step.

Hybrid adds nothing: Feeding the Framingham score as an extra feature to GradientBoosting yields AUC 0.843 — no gain over ML alone. The clinical score does not encode information beyond what the raw biomarkers already contain.

Biomarker-rich panel: The ML advantage likely narrows on datasets without CRP, HbA1c, or multi-reading BP. Replication on HRS (self-reported risk factors, no labs) would test this — and would probably show a smaller gap.