AI Phishing Detection Specialist
An AI Phishing Detection Specialist designs, trains, and deploys machine learning and NLP-based systems that identify phishing ema…
Skill Guide
The systematic extraction, transformation, and creation of structured, machine-readable features from raw email headers, URLs, and domain metadata to power predictive models for threat detection, user profiling, or spam filtering.
Scenario
Given a dataset of 1000 raw .eml files (500 phishing, 500 legitimate), build a script to extract and visualize key header features.
Scenario
Build a Flask/FastAPI microservice that takes a raw URL as input and returns a risk score based on extracted features.
Scenario
Design and implement a scalable feature engineering pipeline for a high-volume email gateway to compute domain reputation in near-real-time.
Python libraries for core parsing and feature extraction. Spark/Beam for distributed processing of large email corpora. FastAPI for serving models. Jupyter for rapid prototyping and visualization.
WHOIS/RDAP for domain metadata and history. Passive DNS for historical IP-domain resolutions. RFC 5322 defines valid header structures. CT logs for domain validation status.
Use SHAP to understand which features drive model predictions. Apply adversarial thinking to anticipate feature manipulation by attackers. Use orchestrators to schedule and monitor feature pipeline jobs.
Answer Strategy
Structure your answer: 1) Lexical Analysis (URL length, digit count, special chars), 2) Host-Based Features (domain age, registrar, DNS TTL), 3) Page Content Features (if accessible, presence of login forms, brand keywords). Emphasize cost: 'An attacker can easily obfuscate the path, but forging an aged domain with valid WHOIS history and matching SSL certificate is expensive.'
Answer Strategy
The question tests proactive model maintenance and adversarial thinking. Answer: 'First, I'd validate the degradation with monitoring dashboards. Then, I'd engineer complementary features that capture the same intent but are harder to spoof, like analyzing the IP reputation of the sending server from the 'Received' chain or checking if the domain has a history of SPF failures. I'd also implement a feature importance decay alert to trigger a review.'
1 career found
Try a different search term.