The analysis drives the model. Never the other way around.
Here is the discipline I aim at Alzheimer’s and ALS — and at every other problem on this site. ADM starts with the question a decision actually needs answered. It figures out what to model, at what level of detail, and why. The rest follows from that.
By Michael Key · ORCID
Why most modeling projects actually go wrong
There’s a persistent myth: projects fail because the technology isn’t ready, or the data is insufficient, or the team lacks talent. Sometimes those things are true. But I’ve watched the same failure pattern repeat dozens of times — across defense and aerospace work spanning two decades, and now in the neurodegeneration research I’ve turned to since. The root cause is almost never technical.
The real failure is poor problem framing.
Teams build models before defining what question those models are supposed to answer. They choose methods because they’re trendy, not because they fit the problem. They measure model performance against the wrong metrics. They over-build or under-build because they never established a principled basis for how detailed the model needs to be.
The result: months of effort producing a system that’s technically impressive and practically useless. The simulation runs beautifully, yet it never informs the decision it was built to support, because the predictive model that scores so well on test data has missed the dynamics that actually drive the downstream call.
Analysis Driven Modeling exists to guard against this. There’s a structure — three steps and a feedback loop — but ADM is a discipline, not a framework. The structure serves the thinking, not the other way around. ADM exists because the alternative — picking a modeling approach before defining the question — wastes time and money.
Let the question set the bar
Every question has a natural fidelity. “Is this system safe?” demands a different level of precision than “which design is cheapest?” The question tells you what kind of model you need and how precise the answer has to be. That’s the fidelity bar — the target that everything else follows from. Before writing a single line of code, four things need to be clear:
- What decision will this model inform? Not “what will it simulate,” but what decision will a human or system make differently because this model exists?
- What fidelity does that decision require? A billion-dollar infrastructure investment requires different model resolution than a preliminary screening decision. The decision sets the fidelity, not the toolchain.
- What physics and dynamics matter? Every real-world system is more complex than any model of it. The art is knowing which dynamics are load-bearing for your specific question and which can be safely abstracted.
- What can be abstracted, simplified, or ignored? The most dangerous assumption in modeling is that more detail is always better. Every unnecessary detail costs compute and validation effort, and introduces error.
The same first move shows up across very different problems. Two from the track record:
When MITRE needed to assess the effectiveness of a proposed kill chain (the sequence from sensor to weapon to effect), the first question wasn’t “which simulation tool?” It was: “What decision does this analysis need to support, and what level of fidelity does that decision require?” The answer — a force-level trade study (a simulation of how an entire engagement plays out), not an engagement-level prediction — reduced the required model complexity by an order of magnitude and delivered results three months faster.
The same discipline applies to risk assessment. When the question is “given a pandemic, what else should we worry about?”, the answer isn’t a Monte Carlo simulation of disease spread — that answers a different question (“what will happen next?”). The right model is a Bayesian causal network: nodes for uncertain events, edges for causal relationships, evidence injection to update beliefs. The right architecture is determined by the question, not by what tool you already know.
The same rigor applies everywhere. When the question is “how detailed does this model need to be?” — whether for an energy grid, a groundwater plume, or a prognosis forecaster — the answer depends entirely on what decision the model is supporting and what conditions it will face. A screening-level adequacy check has radically different fidelity requirements than a stochastic investment optimization. If you don’t define the question first, you’ll build the wrong model.
Build to the bar — not past it
The decision dictates the model. Now you build to it — not past it. This sounds simple. In practice, it’s the step where most projects go wrong.
The fidelity spectrum is wider than most people realize. At one end: back-of-envelope calculations, lookup tables, statistical surrogates. At the other: full physics-based simulations with millions of interacting elements. Both ends have their place. The hard part is knowing where on this spectrum your specific question lives — and resisting the pull toward either extreme.
More fidelity isn’t always better. A model that’s too detailed for its training data will overfit. A simulation that’s too complex for its validation basis will produce results that look precise but aren’t accurate. Every layer of detail beyond what the question requires is cost without value. The discipline is knowing where to stop.
As Chief Analyst at MITRE, I watched this pattern constantly. Modelers would add detail to a simulation just because the framework allowed it — not because the analysis required it. The extra detail didn’t just cost time. It introduced code that could cause unintended interactions, and it multiplied the number of components that needed their fidelity assessed, their accuracy verified, and their logic understood by the next analyst who inherited the model.
In Bayesian risk networks, the same principle holds. A 36-node causal network — one where each event’s odds are a transparent function of its known causes (a noisy-OR structure) — reveals cross-domain cascades that are invisible to simpler models, and every inference pathway stays traceable. Scale to 200 nodes and you might capture more correlations, but the decision-maker can no longer trace why a given risk shifted. The extra fidelity makes the model less useful, not more. The right fidelity is determined by the decision the model needs to support.
The fidelity ladder and spiraling fidelity
You rarely know the exact right fidelity before you build. So don’t try to guess upfront. Instead, use a fidelity ladder: start with the simplest model that could plausibly answer the question, validate whether it’s sufficient, and escalate selectively only where needed.
The process looks like this:
Start simple. A screening-level model—back-of-envelope calculations, analytical solutions, homogeneous assumptions, low-dimensional representations. Build it fast. It will reveal what actually matters and what doesn’t.
Run sensitivity analysis. Which parameters, variables, or model components actually move your answer? Which are decision-irrelevant? This is the critical step. A sensitivity sweep tells you where complexity is wasted and where it’s essential.
Spiral fidelity selectively. Don’t escalate uniformly across the whole model. Add detail only to the components sensitivity analysis identified as decision-critical. Keep the rest simple. A heterogeneous permeability field might matter (escalate that component), but the source term might not (keep it abstracted). A detailed student misconception model might change instruction decisions, but a simpler mastery estimate might be enough for another question.
Validate and repeat. Does this mixed-fidelity model give you the precision, confidence, or granularity the decision requires? If yes, you’re done. If no, identify the next-most-sensitive component and escalate there.
This spiral approach prevents two common failures: over-building (months of effort on components that don’t move the answer) and under-building (discovering too late that a key driver was oversimplified).
Groundwater contamination provides a clear example. A screening model (analytical, homogeneous) answers “does the plume reach the well?” quickly. Sensitivity analysis shows that spatial variation in permeability matters for remediation decisions, but the source term doesn’t. So you escalate the K-field to 2D heterogeneous transport—not the whole model. If uncertainty quantification matters for investment decisions, you escalate again to Monte Carlo, but only over the parameters that sensitivity flagged as important. The discipline is knowing where on the ladder to stop and which components deserve high fidelity.
Uncertainty quantification
Every model has uncertainty. The question is whether you quantify it or ignore it. ADM requires making uncertainty explicit: What are the model’s assumptions? Where are the boundaries of its validity? What happens to the decision when those assumptions are wrong? Monte Carlo analysis, sensitivity studies, and ensemble methods aren’t optional additions. They’re part of the model.
Method selection
The method should follow from the question and the fidelity bar, not precede them. A physics-based simulation is the right choice when temporal dynamics and system interactions drive the answer. Monte Carlo analysis fits when the decision depends on tail risk and the uncertainty range matters more than the point estimate. Gradient boosting or logistic regression may be the right call when you have rich observational data and the question is about prediction. Sometimes the answer is a spreadsheet. Each method has a sweet spot defined by the structure of the problem, not its trendiness.
Validate what matters
The model doesn’t need to be perfect — it needs to be good enough for the decision it serves. Sensitivity analysis tells you which inputs actually move the answer. Uncertainty quantification tells you how much to trust it. If the model is more precise than the decision requires, you’ve overbuilt.
The defense Modeling and Simulation community has spent decades developing rigorous Verification, Validation, and Accreditation (VV&A) practices. These practices exist because the consequences of trusting an unvalidated model in defense are measured in lives. But the underlying principle applies everywhere: a model without validation is a hypothesis, not a tool.
Verification asks: “Did we build the model right?” Does the code implement the intended equations? Are the algorithms numerically stable? Do the outputs make dimensional and physical sense?
Validation asks: “Did we build the right model?” Does the simulation’s behavior match reality within acceptable tolerances for the intended use? Note the phrase for the intended use — this is where ADM connects to V&V. A model validated for one analysis question may be completely inappropriate for another, even if it uses the same underlying physics.
The V&V blind spot
It’s a common blind spot in AI work. Teams validate model accuracy on held-out test sets but never ask whether the model is actually useful for the decision it was built to support. A predictive model can have excellent test-set accuracy and still be useless if it fails to capture the dynamics that matter for the downstream decision. ADM insists on end-to-end validation: does the chain hold all the way from data through the model to the actual decision? If it breaks at any link, the model hasn’t been validated for its intended purpose.
Each answer unlocks the next question
Hard problems are rarely answered by a single question and a single model. More often, they’re answered by the right sequence of questions — where each answer produces something that didn’t exist before, and that something makes the next question askable for the first time.
Often, the first question in the sequence is the one that needs real modeling and simulation. Not because simulation is the goal, but because the problem is too tangled for intuition or arithmetic alone. A grid with 180 GW of generation, a dozen fuel types, hourly demand swings, and weather-dependent renewables doesn’t simplify to a formula. You likely need a simulation to find the breaking point — and that breaking point is a number nobody had before the model produced it.
But that number usually does more than answer the first question. It tends to make the second question precise. “What will this cost consumers?” is a vague worry until you have specific generation requirements, investment timelines, and capacity shortfalls from a simulation. Then it becomes tractable — important analysis, but analysis that only works because the simulation gave it a foundation to stand on.
The second answer often sharpens a third question. The third can sharpen a fourth. Each step in the chain builds on everything below it. Some steps may need simulation. Some may need economic analysis. Some may need nothing more than careful reasoning with the numbers the earlier steps produced. The discipline isn’t applying simulation everywhere — it’s recognizing which question in the chain is the one where modeling and simulation is likely the only way to get a credible answer, and building exactly what that question requires.
When it works well, this kind of chain naturally builds toward the question the decision-maker actually cares about. An engineer might care most about reliability. A regulator might care about consumer rates. A CEO might care about whether to build their own power plant. The simulation at the base of the chain doesn’t need to be redesigned for each audience — it produced the quantitative foundation that every downstream question depends on, and each audience can read the chain at the level where their decision lives.
Consider one study from the track record, on data center load growth in PJM territory. A dispatch simulation might answer “at what load does the grid start to struggle?” Those results — breaking points, investment requirements, generation costs — could feed the next question: “what happens to consumer electricity rates as data center demand grows?” And the rate projections could feed the next: “at what point does it become cheaper for data centers to generate their own power, and what would that exodus mean for everyone still on the grid?” Each question builds on the previous answer. The simulation runs once, at the base of the chain. Everything above it is analysis that the simulation made possible.
The neurodegeneration mission is built as the same kind of chain: from “does this mutation drive aggregation?” on the molecular track toward the question a family actually faces — “how will function change, and when?” — each step earning the next only once its own answer is in hand and honestly calibrated.
From question to decision
The ideas above describe how I think about every modeling problem. Here’s what it looks like in practice — from first question to final answer.
The process is rooted in the OSD Mission Engineering Guide 2.0, the same framework that guides systems-level analysis across the Department of Defense. I’ve used it across dozens of M&S programs. It maps to any domain because the underlying discipline is identical: start with the question, build what the question requires, and make sure the answer holds up against reality.
Define the problem
What is the decision this study needs to inform, and what changes when the model exists? Every study starts here — with the question, the decision it serves, and the constraints. If the question can’t be articulated cleanly, nothing is ready to be built.
Assess the current state
Before designing a model, I map what’s already known — what data exists, what’s been tried, where the real gaps are versus the perceived ones. A structured look at the landscape before a line of code is written.
Architect the solution
With the problem defined and the current state understood, I choose the approach. What method fits the structure of this specific problem — reinforcement learning, Bayesian optimization, a physics-informed model, a statistical surrogate? What level of fidelity does the decision actually require? Every design choice traces back to the question from Phase 1.
Build and validate
Build at the right fidelity, not the maximum fidelity. Validate against the decision the model is supposed to support, not just test-set accuracy. Monte Carlo analysis, sensitivity studies, and progressive testing are part of the model — not afterthoughts bolted on at the end.
Communicate the answer
The output isn’t a model — it’s a defensible, reproducible answer to the question, open for anyone to check. What confidence does it warrant? Where does it break down? The answer includes what to watch going forward: the conditions under which the model’s assumptions no longer hold.
AI coding agents speed up much of this work — running sensitivity sweeps, exploring design spaces, testing models across many conditions. But they don’t change the process. The methodology is what tells them what to build and when to stop.
The questions ADM answers
Every problem starts from a different place, but the questions underneath tend to be the same.
What should I build, and how detailed does it need to be?
There’s a research question that needs a model behind it, but competing approaches and a wide range of possible fidelity levels. Full physics-based simulation or a statistical surrogate? Reinforcement learning or Bayesian optimization? ADM works backward from the decision the model needs to support. The method and the fidelity level follow from the question — not the other way around.
Is what I built actually good enough?
A simulation or trained model runs. Maybe it performs well on test data. But is it actually fit for the decision it’s supposed to support? ADM reframes the question: it’s not about whether the model is accurate in the abstract. It’s about whether the gaps between model and reality are the kind that would change the answer.
Where do I start?
Some problems have an obvious model. Others have no clear starting point. ADM answers the starting-point question first: given the data and the question at hand, what’s the highest-impact thing to build right now? The output is a specific model choice, not a roadmap.
The OSD connection
ADM follows the same principles as the OSD Mission Engineering Guide — the framework DoD uses for modeling and simulation across every service.
“From the beginning, it’s important to have a clear understanding of what goal or decision will be informed as this will drive subsequent choices throughout the process. [...] These decisions guide the specific questions for the activity as well as the degree of fidelity and level of analytic rigor needed from the results, findings, and conclusions.”
— Mission Engineering Guide 2.0, OUSD(R&E)
This is the same principle ADM is built on: the question sets the fidelity bar, which in turn dictates the model and method, and the answer is validated against the decision it needs to support. Anyone reading the work should be able to trace from the decision, through the analysis, to the model assumptions that support it.
The difference is that ADM is how you actually do it. The Mission Engineering Guide tells you what to do. ADM is how to do it, grounded in OUSD(R&E) program work through MITRE, an aerospace prime, and direct program experience.
The same discipline, applied to restricted clinical data
Everything above is how I scope and validate a model. The neurodegeneration work asks more of that discipline: the data is sensitive, it belongs to the repositories that steward it, and a number that comes out of it may one day be read by a family or a clinician. Rigor elsewhere earns no shortcut here — so the commitments below stand on their own, stated as commitments rather than credentials. The full, itemized account is on Data Stewardship; this is the short version.
What the data is, and what the model is for
The clinical tracks would draw on de-identified research cohorts — speech recordings from DementiaBank and the Speech Accessibility Project, longitudinal ALSFRS (ALS Functional Rating Scale) functional-scale data from PRO-ACT, and the Alzheimer’s clinical track from ADNI — to build toward a well-calibrated prognosis forecaster: one whose uncertainty bounds are honest enough to put in front of a family making a real decision. Calibration is the gate, not a given. The fidelity bar is set the same way as everywhere else on this page: by the decision a family or clinician actually faces, not by how elaborate the model could be. To be plain about where this stands: the molecular study used only public data; the clinical tracks begin only once data access is granted, which is not guaranteed on any timeline.
The data’s terms govern — not mine
Access to each restricted cohort is applied for, never assumed. For any cohort granted, it would be used strictly within its Data Use Agreement or license, with its access, redistribution, and retention terms honored as written — the itemized terms are on Data Stewardship. These cohorts are released de-identified; I would keep them that way and make no attempt to re-identify anyone. If a repository’s terms ever conflict with anything stated here, the repository’s terms govern.
Patient data never reaches a third-party AI
This is a hard operating rule about what crosses the boundary, not a vendor retention promise. Any restricted patient-level records are processed only by code running locally, on a single machine, encrypted at rest, with the data kept out of any cloud-sync path. AI coding assistants — Claude Code among them — help me write the analysis code and reason about method, but they are pointed only at code and aggregate results, never at the protected-data directories, and no individual record or row is ever pasted into a chat or prompt to debug. Only aggregate outputs computed across many records — summary statistics, calibration curves, and the like, never any output that resolves to or can be tied back to an individual — cross that local boundary. The model never receives the records.
This is already how the public molecular data is handled; the same rule governs the restricted clinical tracks the moment any of that data arrives — itemized as a standing commitment in Data Stewardship.
One caveat I’ll state plainly: I’m a solo researcher with no institutional IRB of record. Where a repository requires a review, determination, or documented exemption as a condition of access, I defer to that process before using the data rather than working around it. The full account — storage, retention and destruction, human-subjects posture, and the sha256-locked provenance behind every published number — is on Data Stewardship.
I hold the neurodegeneration work to the same standard as the defense modeling.
Start from the decision. Build to it. Validate what matters — so the answer is one I’d be willing to publish.