Skip to main content

Learning Roadmap

How to Become a AI Causal Inference Analyst

A step-by-step, phase-based learning path from beginner to job-ready AI Causal Inference Analyst. Estimated completion: 7 months across 4 phases.

4 Phases
28 Weeks Total
High Entry Barrier
Advanced Difficulty
Your Progress 0 / 4 phases

Progress saved in your browser — no account needed.

  1. Statistical Foundations & Causal Thinking

    6 weeks
    • Master probability theory, statistical inference, and linear regression
    • Understand the fundamental problem of causal inference and counterfactual reasoning
    • Learn to distinguish correlation from causation using Simpson's Paradox and collider bias examples
    • Build fluency in Python or R for statistical analysis
    • Causal Inference: The Mixtape by Scott Cunningham (free online)
    • Brady Neal's Causal Inference course (YouTube)
    • Introduction to Statistical Learning (ISLR) - Chapters 1-4
    • Think Stats by Allen Downey (free online)
    Milestone

    You can articulate the causal inference problem, draw basic DAGs, and identify confounders, colliders, and mediators in observational datasets.

  2. Core Causal Methods

    8 weeks
    • Master matching, weighting, and stratification using propensity scores
    • Learn difference-in-differences with staggered adoption extensions
    • Implement regression discontinuity designs and validate bandwidth sensitivity
    • Understand instrumental variables and exclusion restriction assumptions
    • Causal Inference: The Mixtape - Chapters 3-7
    • The Effect by Nick Huntington-Klein (free online)
    • DoWhy Python library tutorials and documentation
    • Scott Cunningham's causal inference video lecture series
    Milestone

    You can independently design and execute a causal study using at least three different identification strategies and defend your assumptions.

  3. Advanced Methods & Machine Learning Integration

    8 weeks
    • Learn doubly robust estimators, TMLE, and causal forests for heterogeneous treatment effects
    • Explore synthetic control methods and generalized synthetic controls
    • Integrate ML models into causal pipelines (e.g., LASSO for covariate selection, causal forests for CATE estimation)
    • Study mediation analysis and natural experiments
    • EconML library documentation and Microsoft Research tutorials
    • Susan Athey and Stefan Wager's papers on causal forests
    • Targeted Learning by Mark van der Laan and Sherri Rose
    • CausalML library documentation and examples
    Milestone

    You can estimate heterogeneous treatment effects using ML-augmented causal methods and apply sensitivity analyses to quantify robustness of findings.

  4. Production & Professional Skills

    6 weeks
    • Build reproducible causal analysis pipelines using dbt, Python, and SQL
    • Create executive-level dashboards and causal insight reports
    • Learn experiment management platforms and A/B testing infrastructure
    • Develop consulting-style communication for non-technical stakeholders
    • Designing and Analyzing Experiments by Alex Hadjinicolaou
    • Storytelling with Data by Cole Nussbaumer Knaflic
    • Experimentation platforms: Statsig, LaunchDarkly, Optimizely documentation
    • GitHub portfolio of causal analysis projects
    Milestone

    You can deliver end-to-end causal studies from problem framing through production-grade analysis to stakeholder-ready recommendations.

Practice Projects

Apply your skills with hands-on projects. Ordered by difficulty.

Advertising Campaign Causal Impact Analysis

Beginner

Use the CausalImpact R package or Python equivalent to estimate the causal effect of a simulated advertising campaign on sales using Bayesian structural time series. Create synthetic data with known ground truth to validate your estimates.

~15h
Bayesian time seriesSynthetic control reasoningSensitivity analysis

Propensity Score Matching Study on Job Training Effectiveness

Intermediate

Replicate the LaLonde (1986) dataset analysis. Compare naive regression, propensity score matching, and IPW estimates against experimental benchmarks. Assess covariate balance and sensitivity to specification choices.

~25h
Propensity score estimationCovariate balance assessmentIPW weighting

Regression Discontinuity Analysis of Scholarship Eligibility

Intermediate

Analyze a simulated or real dataset where a scholarship is awarded based on a test score cutoff. Implement sharp RDD with local polynomial regression, bandwidth selection (IK or CCT), and density tests for manipulation.

~20h
RDD designBandwidth selectionDensity tests

Difference-in-Differences Analysis of Minimum Wage Policy

Intermediate

Replicate the Card and Krueger (1994) minimum wage study using modern DiD methods including staggered adoption estimators (Callaway-Sant'Anna). Construct event study plots and conduct robustness checks.

~25h
Two-way fixed effectsEvent study designStaggered DiD methods

Heterogeneous Treatment Effects with Causal Forests

Advanced

Use EconML's CausalForestDML or the grf R package to estimate treatment effect heterogeneity in a clinical trial or marketing dataset. Identify subgroups with high/low treatment effects and validate with out-of-sample predictions.

~30h
Causal forestsCATE estimationHeterogeneity analysis

End-to-End Causal Pipeline with DoWhy and Refutation Suite

Advanced

Build a complete causal analysis using DoWhy from graph construction through identification, estimation, and refutation. Apply it to a real-world dataset (e.g., IHDP or Twins) with known ground truth. Document every assumption and test robustness with placebo treatments, random common causes, and data subset validation.

~35h
DoWhy pipelineCausal graph constructionAutomated refutation

Mediation Analysis of Algorithm Impact on User Behavior

Advanced

Estimate the direct and indirect effects of a recommendation algorithm on user satisfaction, where engagement behavior mediates the relationship. Use the potential outcomes framework for mediation and compare with Baron-Kenny regression approaches.

~30h
Mediation analysisDirect/indirect effect decompositionNatural experiments

Synthetic Control Study of a State Policy Intervention

Intermediate

Implement the synthetic control method to evaluate a real policy intervention (e.g., California's Proposition 99 tobacco tax). Construct donor pools, validate with placebo tests, and visualize gaps between treated and synthetic control.

~20h
Synthetic control constructionDonor pool selectionPlacebo testing

Ready to Start Your Journey?

Prep for interviews alongside your learning — it reinforces every concept.