AI Phishing Detection Specialist
An AI Phishing Detection Specialist designs, trains, and deploys machine learning and NLP-based systems that identify phishing ema…
Skill Guide
A technical methodology for training robust classifiers on skewed datasets by combining synthetic minority oversampling (SMOTE), a modified loss function (focal loss) that down-weights easy examples, and controlled data partitioning (stratified sampling) to preserve class distribution in model evaluation.
Scenario
You have a credit card transaction dataset where fraudulent transactions constitute less than 0.2% of the data. Your task is to build a classifier to identify them.
Scenario
You are working with MRI scans where tumor pixels (minority class) are vastly outnumbered by healthy tissue pixels (majority class) in a segmentation task.
Scenario
Deploy a model to predict critical but rare machine failures (1 in 10,000 events) in an IoT sensor data stream. The model must have high recall and be periodically retrained.
imbalanced-learn is the industry standard for resampling techniques (SMOTE, variants). Framework loss modules allow custom focal loss implementation. Scikit-learn provides the necessary tools for proper stratified data splitting and evaluation.
AUPRC is the definitive metric for imbalanced classification. The confusion matrix provides direct insight into false negatives (critical misses). F-beta allows tuning the balance between precision and recall for business needs.
Cost-sensitive thinking frames the problem as minimizing asymmetric costs of errors. The checklist ensures SMOTE is applied post-split. The design pattern encourages combining resampling with modified loss functions for optimal results.
1 career found
Try a different search term.