Skill Guide

Statistical power analysis and sample size estimation

The process of determining the required sample size for a study to detect a statistically significant effect of a given magnitude, if that effect exists, while controlling for the risk of false positives (Type I errors) and false negatives (Type II errors).

This skill is critical for ensuring research and experimental initiatives are resource-efficient, ethically sound, and produce actionable results. It directly impacts business outcomes by preventing wasted investment in underpowered studies that yield inconclusive data and enabling the confident validation of product changes, marketing strategies, and operational improvements.

1 Careers

1 Categories

8.7 Avg Demand

25% Avg AI Risk

How to Learn Statistical power analysis and sample size estimation

1. Understand the core quartet: Null/Alternative Hypothesis, Significance Level (α), Power (1-β), and Minimum Detectable Effect (MDE). 2. Grasp the concept of the sampling distribution of the test statistic. 3. Learn the formula for a simple two-sample z-test or t-test for means as a foundational model.

1. Move from calculators to software (R, Python) for power analysis. 2. Apply power analysis to common business scenarios: A/B testing for conversion rates (χ² or proportion tests), comparing means for continuous metrics (t-tests, ANOVA). 3. Recognize that MDE is a business decision (the smallest effect worth detecting) not a statistical one, and learn to discuss it with product managers.

1. Design multi-factor experiments (factorial designs) and adjust for multiple comparisons. 2. Use simulation-based power analysis for complex models (e.g., hierarchical models, time-to-event analysis). 3. Develop organizational playbooks and pre-registration templates that mandate power analysis for all major initiatives, and mentor analysts on communicating uncertainty and trade-offs to stakeholders.

Practice Projects

Beginner

Project

A/B Test Sample Size Calculator for Website CTA

Scenario

Your product team wants to test two different versions of a 'Sign Up' button (control vs. variant) to see which has a higher click-through rate. The current baseline rate is 5%. They want to detect a relative increase of at least 20% (to 6%) with 80% power and 95% confidence.

How to Execute

1. Identify parameters: Baseline Rate (p1=0.05), MDE (relative 20% -> absolute 0.01, so p2=0.06), α=0.05, Power=0.8. 2. Use an online calculator or Python statsmodels.stats.power.tt_ind_solve_power to compute the required sample size per group. 3. Report the total sample size needed, emphasizing the assumptions made. 4. Reflect: How would the required sample size change if we only wanted to detect a 10% relative increase?

Intermediate

Project

Power Analysis for a Marketing Campaign Experiment

Scenario

The marketing team plans to run a randomized controlled trial to measure the impact of a new email campaign on customer lifetime value (LTV), a continuous, skewed metric. They believe a $10 increase in LTV is meaningful. Historical LTV mean is $100, with a standard deviation of $50.

How to Execute

1. Select the appropriate test: two-sample t-test (or a non-parametric alternative if normality is dubious). 2. Set parameters: Δ=$10, σ=$50, α=0.05, Power=0.8. 3. Use software (e.g., R's pwr.t.test) to calculate n. 4. Critically, discuss with the team the feasibility of the required sample size given their email list size and expected recruitment rate. Propose adjustments (e.g., accepting lower power, a higher MDE) if the requirement is prohibitive.

Advanced

Project

Simulation-Based Power Analysis for a Cluster-Randomized Trial

Scenario

A/B testing a new onboarding flow where the unit of randomization is the 'account' (a company) but the outcome is measured at the individual user level within that account. Users within an account are correlated (intraclass correlation, ICC). This violates assumptions of simple random sampling.

How to Execute

1. Model the data-generating process: specify the number of clusters (k), the average cluster size (m), the ICC, the effect size, and the residual variance. 2. Write a simulation script (e.g., in Python using numpy and scipy) that generates fake data under both the null and alternative hypotheses for thousands of iterations. 3. For each iteration, run the planned analysis (e.g., a mixed-effects model with a random intercept for account) and record the p-value. 4. The estimated power is the proportion of iterations under the alternative hypothesis that reject the null. Use this to iteratively refine k and m.

Tools & Frameworks

Software & Platforms

R (pwr, simr, TrialSize packages)Python (statsmodels.stats.power, scipy.stats, pingouin)G*Power (GUI software)Optimizely/VWO (Online Experiment Platform calculators)

R and Python are essential for programmatic, reproducible, and simulation-based analyses. G*Power is excellent for rapid, GUI-driven calculations for standard tests. Platform calculators are useful for quick checks in A/B testing contexts but should not be trusted for complex designs.

Mental Models & Methodologies

Pre-Registration FrameworkEffect Size Hierarchy (Cohen's d, f, w, OR, RR)Prospective vs. Retrospective Power Analysis

Pre-registration enforces discipline by requiring the power analysis plan *before* data collection. Understanding effect size benchmarks prevents designing studies with unrealistic expectations. Prospective (a priori) analysis guides planning; retrospective (post-hoc) analysis is controversial for interpreting null results but useful for planning future studies.