Learning Roadmap
How to Become a AI Causal Inference Analyst
A step-by-step, phase-based learning path from beginner to job-ready AI Causal Inference Analyst. Estimated completion: 7 months across 4 phases.
Progress saved in your browser — no account needed.
-
Statistical Foundations & Causal Thinking
6 weeksGoals
- Master probability theory, statistical inference, and linear regression
- Understand the fundamental problem of causal inference and counterfactual reasoning
- Learn to distinguish correlation from causation using Simpson's Paradox and collider bias examples
- Build fluency in Python or R for statistical analysis
Resources
- Causal Inference: The Mixtape by Scott Cunningham (free online)
- Brady Neal's Causal Inference course (YouTube)
- Introduction to Statistical Learning (ISLR) - Chapters 1-4
- Think Stats by Allen Downey (free online)
MilestoneYou can articulate the causal inference problem, draw basic DAGs, and identify confounders, colliders, and mediators in observational datasets.
-
Core Causal Methods
8 weeksGoals
- Master matching, weighting, and stratification using propensity scores
- Learn difference-in-differences with staggered adoption extensions
- Implement regression discontinuity designs and validate bandwidth sensitivity
- Understand instrumental variables and exclusion restriction assumptions
Resources
- Causal Inference: The Mixtape - Chapters 3-7
- The Effect by Nick Huntington-Klein (free online)
- DoWhy Python library tutorials and documentation
- Scott Cunningham's causal inference video lecture series
MilestoneYou can independently design and execute a causal study using at least three different identification strategies and defend your assumptions.
-
Advanced Methods & Machine Learning Integration
8 weeksGoals
- Learn doubly robust estimators, TMLE, and causal forests for heterogeneous treatment effects
- Explore synthetic control methods and generalized synthetic controls
- Integrate ML models into causal pipelines (e.g., LASSO for covariate selection, causal forests for CATE estimation)
- Study mediation analysis and natural experiments
Resources
- EconML library documentation and Microsoft Research tutorials
- Susan Athey and Stefan Wager's papers on causal forests
- Targeted Learning by Mark van der Laan and Sherri Rose
- CausalML library documentation and examples
MilestoneYou can estimate heterogeneous treatment effects using ML-augmented causal methods and apply sensitivity analyses to quantify robustness of findings.
-
Production & Professional Skills
6 weeksGoals
- Build reproducible causal analysis pipelines using dbt, Python, and SQL
- Create executive-level dashboards and causal insight reports
- Learn experiment management platforms and A/B testing infrastructure
- Develop consulting-style communication for non-technical stakeholders
Resources
- Designing and Analyzing Experiments by Alex Hadjinicolaou
- Storytelling with Data by Cole Nussbaumer Knaflic
- Experimentation platforms: Statsig, LaunchDarkly, Optimizely documentation
- GitHub portfolio of causal analysis projects
MilestoneYou can deliver end-to-end causal studies from problem framing through production-grade analysis to stakeholder-ready recommendations.
Practice Projects
Apply your skills with hands-on projects. Ordered by difficulty.
Advertising Campaign Causal Impact Analysis
BeginnerUse the CausalImpact R package or Python equivalent to estimate the causal effect of a simulated advertising campaign on sales using Bayesian structural time series. Create synthetic data with known ground truth to validate your estimates.
Propensity Score Matching Study on Job Training Effectiveness
IntermediateReplicate the LaLonde (1986) dataset analysis. Compare naive regression, propensity score matching, and IPW estimates against experimental benchmarks. Assess covariate balance and sensitivity to specification choices.
Regression Discontinuity Analysis of Scholarship Eligibility
IntermediateAnalyze a simulated or real dataset where a scholarship is awarded based on a test score cutoff. Implement sharp RDD with local polynomial regression, bandwidth selection (IK or CCT), and density tests for manipulation.
Difference-in-Differences Analysis of Minimum Wage Policy
IntermediateReplicate the Card and Krueger (1994) minimum wage study using modern DiD methods including staggered adoption estimators (Callaway-Sant'Anna). Construct event study plots and conduct robustness checks.
Heterogeneous Treatment Effects with Causal Forests
AdvancedUse EconML's CausalForestDML or the grf R package to estimate treatment effect heterogeneity in a clinical trial or marketing dataset. Identify subgroups with high/low treatment effects and validate with out-of-sample predictions.
End-to-End Causal Pipeline with DoWhy and Refutation Suite
AdvancedBuild a complete causal analysis using DoWhy from graph construction through identification, estimation, and refutation. Apply it to a real-world dataset (e.g., IHDP or Twins) with known ground truth. Document every assumption and test robustness with placebo treatments, random common causes, and data subset validation.
Mediation Analysis of Algorithm Impact on User Behavior
AdvancedEstimate the direct and indirect effects of a recommendation algorithm on user satisfaction, where engagement behavior mediates the relationship. Use the potential outcomes framework for mediation and compare with Baron-Kenny regression approaches.
Synthetic Control Study of a State Policy Intervention
IntermediateImplement the synthetic control method to evaluate a real policy intervention (e.g., California's Proposition 99 tobacco tax). Construct donor pools, validate with placebo tests, and visualize gaps between treated and synthetic control.
Ready to Start Your Journey?
Prep for interviews alongside your learning — it reinforces every concept.