AI Flight Risk Analyst
An AI Flight Risk Analyst leverages machine learning, people analytics, and HR data pipelines to predict which employees are likel…
Skill Guide
The application of statistical and machine learning techniques to model outcomes for discrete binary events (e.g., churn/fraud) and time-to-event data with censoring (e.g., customer lifetime, equipment failure).
Scenario
Given a SaaS company's user activity log and subscription end dates, predict which users will churn in the next month and estimate their remaining lifetime.
Scenario
A fintech needs to predict the probability a loan will default (binary) and, for defaults, model the time until default occurs to optimize reserves.
Scenario
Design a system to predict component failure (binary: fail in next 7 days) and remaining useful life (RUL) for a fleet of manufacturing machines using sensor telemetry.
Use `lifelines` for Cox PH and Kaplan-Meier, `scikit-survival` for survival-compatible ML models, and `xgboost` for state-of-the-art classification. SQL is non-negotiable for data sourcing. MLflow tracks experiment lineage for both model types.
Cox PH is the industry workhorse for survival analysis. Use Random Survival Forests for non-linear relationships. SHAP (via `shap` library) is critical for explaining both classification and survival model predictions to stakeholders. Calibration plots ensure predicted probabilities/risk scores match observed frequencies.
Answer Strategy
Test understanding of cost-sensitive learning and metric selection. **Answer:** For binary classification, I adjust the class weight parameter in the loss function (e.g., `class_weight={0:1, 1:5}`) or optimize for the F-beta score with beta>1 to favor recall. I'd then evaluate using a cost-sensitive metric like expected cost. In a survival framework, this translates to focusing on the predicted survival function: I'd set a decision threshold based on the predicted probability of churning by a key date (e.g., 30 days) that minimizes the expected cost, using the survival curve's cumulative hazard to inform that probability.
Answer Strategy
Tests diagnostic skills and stakeholder management. **Answer:** I first check the proportional hazards assumption for 'department' using Schoenfeld residuals and visual plots. If violated, I explore stratified Cox models or include time-varying coefficients. If the assumption holds, I examine the variable's correlation with others (e.g., 'seniority') via VIF or mutual information-it may be redundant. I'd then present these findings to the business, explaining that the data does not support an independent effect, and propose either including it as a stratification factor for sub-group analysis or engineering a new feature (e.g., 'department x tenure') that may capture their intended signal.
1 career found
Try a different search term.