Skill Guide

Time-to-event modeling (Kaplan-Meier, Cox proportional hazards)

Time-to-event modeling is a statistical methodology for analyzing the duration until one or more events occur, using non-parametric Kaplan-Meier estimators to visualize survival curves and semi-parametric Cox proportional hazards models to quantify the effect of covariates on event risk.

This skill is critical for data-driven decision-making in domains where timing is as important as occurrence (e.g., clinical trials, customer churn, predictive maintenance). It directly impacts strategic resource allocation, risk mitigation, and intervention timing, translating to higher revenue retention, optimized operational costs, and regulatory compliance.

1 Careers

1 Categories

8.7 Avg Demand

25% Avg AI Risk

How to Learn Time-to-event modeling (Kaplan-Meier, Cox proportional hazards)

1. **Foundational Concepts & Assumptions:** Grasp core terminology: survival function S(t), hazard function h(t), censoring (right, left, interval). Understand the key assumptions: non-informative censoring for KM, proportional hazards (PH) for Cox. 2. **Kaplan-Meier (KM) Mechanics:** Learn to manually calculate and interpret a KM curve. Focus on the concept of 'number at risk' tables and the log-rank test for comparing groups. 3. **Cox Model Intuition:** Move beyond black-box usage. Understand the partial likelihood, the interpretation of hazard ratios (HR=e^β), and the distinction between baseline hazard and covariate effects.

1. **Practical Diagnostics & Violations:** Go beyond running a model. Use Schoenfeld residuals plots to test the PH assumption for a Cox model. Learn remedies for PH violations: stratified Cox models, time-varying coefficients (e.g., `coxph(Surv(time, status) ~ age + strata(trt)` in R). 2. **Common Pitfall - Misinterpreting Censoring:** Actively audit datasets for informative vs. non-informative censoring. Practice distinguishing administrative censoring from dropout. Build a data pipeline that flags potential informative censoring. 3. **Scenario Application:** Apply survival analysis to a business problem (e.g., time-to-repeat purchase) instead of a medical one. This forces adaptation of frameworks to messy, real-world data with competing risks and non-standard time origins.

1. **Advanced Modeling & Extensions:** Master handling **competing risks** (Fine-Gray model), **recurrent events** (Andersen-Gill model), and **joint models** (linking longitudinal biomarkers to survival). 2. **Causal Inference Integration:** Apply survival analysis in causal frameworks. Use propensity score methods or targeted maximum likelihood estimation (TMLE) to estimate causal survival curves from observational data where treatment is not randomized. 3. **Productionalization & Monitoring:** Architect a system where a Cox model or a time-dependent deep survival model (e.g., DeepSurv) is deployed in production. Focus on model monitoring for concept drift (shifting hazard over time), recalibration strategies, and A/B testing of risk-based interventions.

Practice Projects

Beginner

Project

Customer Churn Survival Analysis

Scenario

You have a dataset from a SaaS company with customer signup dates, subscription end dates (or last activity date), churn status (1=churn, 0=censored), and basic covariates (plan_tier, acquisition_channel).

How to Execute

1. **Data Prep:** Create a `Surv` object in R (or equivalent in Python's `lifelines`) with `time` (duration in days) and `event` (status) columns. 2. **KM Curve:** Plot overall KM curve. Calculate median customer lifetime. 3. **Group Comparison:** Create KM curves stratified by `plan_tier`. Perform log-rank test to see if survival differs significantly. 4. **Simple Cox Model:** Fit a Cox model with `plan_tier` and `acquisition_channel`. Report hazard ratios and their confidence intervals.

Intermediate

Project

E-Commerce Time-to-Repurchase with Marketing Interventions

Scenario

Analyze time from first purchase to second purchase for a retail cohort. The business wants to know if a specific email campaign (sent at day 14) affects repurchase timing, controlling for customer demographics and initial purchase value.

How to Execute

1. **Define Event & Time Origin:** Event is second purchase. Time origin is date of first purchase. Censor customers who haven't repurchased by study end. 2. **PH Assumption Check:** Fit an initial Cox model with the campaign indicator. Use `cox.zph` in R to test proportional hazards. If violated, include an interaction term for `campaign * log(time)` or stratify by a major confounder. 3. **Time-Dependent Covariate (Advanced Step):** If the campaign effect is believed to decay, model it as a time-dependent covariate (e.g., `campaign` is 0 until day 14, then 1 after). 4. **Visualize & Interpret:** Plot predicted survival curves for 'treated' vs. 'untreated' covariate profiles. Quantify the 'days saved' or 'repurchase probability lift' at key horizons (e.g., 30, 60, 90 days).

Advanced

Project

Predictive Maintenance with Competing Failure Modes

Scenario

A manufacturing plant has sensor data from machines. They experience two distinct failure types (Type A: electrical, Type B: mechanical). The goal is to model time-to-failure for each type, but a machine failing from one type cannot fail from the other (competing risks). Engineers need to schedule maintenance prioritizing the higher-risk failure mode.

How to Execute

1. **Competing Risks Framework:** Do not use standard Cox model. Use the **Fine-Gray sub-distribution hazard model** for each failure type, which accounts for the presence of the other event. 2. **Feature Engineering:** Create time-varying covariates from sensor streams (e.g., rolling averages, variance of vibration). 3. **Model & Validate:** Fit separate Fine-Gray models for Type A and Type B failure. Use time-dependent AUC (tdAUC) or Brier score for validation, not just C-index. 4. **Decision Rule Output:** Create a dynamic risk score that outputs, for each machine, the 30-day probability of failure due to Type A vs. Type B. Implement a rule: 'If P(Type A) > 0.15 and P(Type A) > P(Type B), schedule electrical inspection'.

Tools & Frameworks

Software & Platforms

R (survival, survminer, cmprsk, riskRegression packages)Python (lifelines, scikit-survival, statsmodels.duration)SAS (PROC PHREG, PROC LIFETEST)Stata (stset, stcox, sts)Python (PyCox for deep survival models)

R's `survival` package is the industry gold standard for classical models. Python's `lifelines` is the go-to for Python-centric teams and offers good API design. Use `scikit-survival` for ML integration (e.g., random survival forests). `PyCox` is essential for implementing deep learning-based survival models (DeepSurv, DeepHit).

Core Methodological Frameworks

Kaplan-Meier EstimatorLog-Rank TestCox Proportional Hazards ModelSchoenfeld Residuals (PH Test)Fine-Gray Model (Competing Risks)Aalen's Additive Model (Alternative to Cox)

KM/Log-Rank are for descriptive, unadjusted analyses. Cox is the workhorse for multivariable regression. Schoenfeld residuals are non-negotiable for model validation. Fine-Gray is critical for any scenario with multiple event types. Aalen's model is a robust alternative when the PH assumption is strongly violated and you want to model time-varying effects directly.

Key Metrics & Evaluation

Hazard Ratio (HR)Concordance Index (C-index)Time-Dependent AUCCalibration PlotSchoenfeld Residuals Plot

HR is the primary effect measure. C-index is a rank-based discrimination metric (like AUC for survival). Time-Dependent AUC is more informative for evaluating predictive accuracy at specific time points. Calibration plots check if predicted probabilities match observed frequencies.

Interview Questions

Answer Strategy

The question tests knowledge of **PH assumption validation and remedial actions**. Do not just mention Schoenfeld residuals; explain the process and alternatives. **Sample Answer:** 'First, I'd formally test the PH assumption using Schoenfeld residuals, both globally and per-covariate, via the `cox.zph` function. If the assumption for 'region' is violated (p < 0.05), I would not discard the model. My next steps depend on the plot: if the residuals show a monotonic trend, I'd add a time-interaction term (e.g., `region:log(time)`). If the violation is more complex, I'd stratify the model by region (`strata(region)`), allowing each region to have its own baseline hazard while still estimating common coefficients for other covariates. If the covariate is a primary exposure of interest, I might move to an Aalen additive model to model its time-varying effect directly.'

Answer Strategy

This tests understanding of **non-proportional hazards and the limitations of summary statistics like HR**. **Sample Answer:** 'Crossing KM curves indicate non-proportional hazards; the treatment effect changes direction or magnitude over time. The summary hazard ratio from a standard Cox model becomes uninterpretable and potentially misleading (could be ~1.0 despite clear differences). My analysis strategy must shift from a single HR to a time-specific analysis. I would: 1) Report survival probabilities at clinically relevant time points (e.g., 1-year, 2-year). 2) Use a stratified log-rank test (e.g., Tarone-Ware) that is more sensitive to early/late differences. 3) If pre-specified, fit a piecewise Cox model or a model with a time-dependent treatment effect to quantify the changing hazard. The key is to present the full survival story, not a single number.'