Skip to main content

Skill Guide

Potential outcomes framework (Rubin Causal Model) and SUTVA assumptions

The Potential Outcomes Framework (Rubin Causal Model) defines causal effects by comparing what happened to a unit under treatment versus what *would have* happened to that same unit under control, where SUTVA (Stable Unit Treatment Value Assumption) ensures no interference between units and a single, consistent treatment version.

This framework is foundational for credible causal inference in data science, enabling organizations to move beyond mere correlation to measure the true impact of interventions on business metrics. Mastery directly supports rigorous A/B testing, policy evaluation, and ROI estimation, reducing costly misallocations of resources.
1 Careers
1 Categories
8.7 Avg Demand
15% Avg AI Risk

How to Learn Potential outcomes framework (Rubin Causal Model) and SUTVA assumptions

1. Grasp the core counterfactual notation (Yi(1), Yi(0)) and the 'fundamental problem of causal inference' (we never observe both for one unit). 2. Understand the three components of SUTVA: no interference, no hidden treatment variations, and a well-defined unit population. 3. Study the role of randomization in creating comparable treatment/control groups to estimate the Average Treatment Effect (ATE).
1. Move to non-randomized observational studies. Learn propensity score matching (PSM) and inverse probability weighting (IPW) as methods to create pseudo-randomized groups conditional on observed covariates. 2. Recognize the 'ignorability' (unconfoundedness) assumption required for these methods. 3. Common mistake: confusing selection bias (violation of ignorability) with interference (violation of SUTVA) and applying the wrong correction method.
1. Master sensitivity analyses (e.g., Rosenbaum bounds) to quantify how robust your finding is to potential unmeasured confounders (hidden bias). 2. Explore methods for interference (SUTVA violation), like spillover effects in networks using two-stage randomized designs or models for peer influence. 3. Architect causal inference pipelines that transparently state assumptions, test for SUTVA violations, and choose estimators accordingly, mentoring teams on these choices.

Practice Projects

Beginner
Case Study/Exercise

Evaluating a New Onboarding Email's Effect on User Activation

Scenario

A product team claims a new onboarding email sequence increases 7-day user activation rates. You have data from an A/B test they ran. The test randomly assigned new users to receive either the new email (Treatment) or the old one (Control). Activation rate (Y) is the outcome.

How to Execute
1. Define the potential outcomes: Yi(1) = activation if received new email, Yi(0) = activation if received old email. 2. Confirm random assignment was clean (no crossover). 3. Calculate the sample ATE = mean(Y|T=1) - mean(Y|T=0). 4. Explicitly check SUTVA: Did users in the treatment group share the new email content with control users (interference)? Were all 'new emails' identical (consistent treatment)?
Intermediate
Case Study/Exercise

Estimating the Impact of a Premium Feature on Customer LTV Using Observational Data

Scenario

You need to estimate the causal effect of adopting a premium feature (Treatment) on Customer Lifetime Value (Outcome) using historical log data. Users self-selected into adopting the feature, creating likely confounding (e.g., power users are both more likely to adopt and to have high LTV).

How to Execute
1. Identify potential confounders (e.g., prior usage, account age, support tickets). 2. Implement Propensity Score Matching: model the probability of adopting the feature given confounders, match treated users to control users with similar scores. 3. Estimate the ATE on the matched sample. 4. Perform a sensitivity analysis (e.g., using the `sensitivitymv` or `rbounds` package in R) to assess how much hidden bias would be needed to nullify the result.
Advanced
Case Study/Exercise

Designing a Referral Program Experiment with Potential Spillover

Scenario

A company launches a referral bonus program where users (Referrers) get a reward if their friends (Referrals) sign up. You suspect the program might change the referrer's own engagement beyond the reward, affecting outcomes for their other connected friends, violating SUTVA via network interference.

How to Execute
1. Design a cluster-randomized trial: Randomize entire social clusters (e.g., all friends of a user) to treatment/control conditions to contain interference within clusters. 2. Alternatively, use a 'two-stage' randomization: First randomize the offer to a subset of potential referrers, then measure outcomes on their network connections. 3. Analyze using models that account for peer effects (e.g., linear-in-means models) or define estimands like the total effect on a treated node's network. 4. Report results with explicit acknowledgement of SUTVA limitations and the strategies used to mitigate them.

Tools & Frameworks

Statistical & Computational Tools

R (packages: `MatchIt`, `WeightIt`, `sensemakr`, `rbounds`)Python (libraries: `CausalInference`, `DoWhy`, `EconML`, `causalml`)Stata (`teffects`, `psmatch2`, `sensemakr`)

Use for implementing matching, weighting, sensitivity analysis, and advanced estimators. `DoWhy` and `EconML` provide end-to-end pipelines from assumption statement to estimation. Essential for moving from conceptual understanding to executable analysis.

Mental Models & Methodological Frameworks

Potential Outcomes (Rubin) Causal ModelDirected Acyclic Graphs (DAGs) / Causal DiagramsIDEA Framework for A/B Test Design (Identify, Design, Execute, Analyze)

The Rubin model provides the fundamental counterfactual language. DAGs are used to visually map assumptions about confounders and mechanisms, helping to choose the right identification strategy (e.g., matching vs. IV). The IDEA framework operationalizes SUTVA and randomization in live experiments.

Interview Questions

Answer Strategy

The interviewer is testing understanding of SUTVA (specifically, the assumption of no interference and consistent treatment) and the concept of confounding in cluster randomization. The concern is a potential violation of the 'no interference' component of SUTVA if treatments spill across city borders, or a confounding of the treatment effect with a city-level trend. Strategy: Acknowledge the concern as a potential SUTVA violation/cluster confounder. Propose checking for spillover (e.g., are control cities near treatment cities?) and using a hierarchical/multilevel model to account for city-level random effects and any measured city-level covariates.

Answer Strategy

This is a behavioral question testing the practical application of causal inference thinking. The core competency is the ability to make and justify assumptions under real-world constraints. Sample Response: 'In a project evaluating the impact of a new sales tool, we used observational data. The key assumption was unconfoundedness-conditional on rep tenure, region, and past performance, tool adoption was as good as random. We validated this by checking balance on covariates after propensity score matching and running a sensitivity analysis. We also checked for SUTVA by ensuring reps didn't collaborate on the same accounts, which could cause interference.'

Careers That Require Potential outcomes framework (Rubin Causal Model) and SUTVA assumptions

1 career found