AI Fraud Detection Specialist
An AI Fraud Detection Specialist designs, deploys, and continuously optimizes machine-learning and NLP systems that identify fraud…
Skill Guide
The systematic process of transforming raw transactional records, user interaction logs, and device hardware/software attributes into predictive, model-ready variables that capture patterns of intent, risk, and identity.
Scenario
You have a dataset of user sessions with page_view, add_to_cart, and purchase events. The goal is to predict if a session is likely fraudulent (e.g., card testing).
Scenario
You have raw device fingerprint data collected over 6 months. The task is to identify which fingerprint components are the most stable for user identification and which are prone to change (e.g., after OS updates).
Scenario
Design and document a feature engineering pipeline for a real-time credit scoring model that uses a borrower's transactional history (last 90 days) and behavioral data from the loan application app (click patterns, time spent).
Pandas for prototyping and batch processing. Feature stores for managing, serving, and versioning features consistently between training and serving. Stream processors for generating real-time aggregations from event streams.
Windowing is fundamental for creating temporal aggregations. Target encoding handles high-cardinality features. Embeddings (learned via NNs) capture semantic relationships. Validation frameworks ensure features remain stable in production.
Answer Strategy
The question tests practical feature design and awareness of temporal data leakage. Strategy: Define the feature (count of transactions in a window), specify the window (e.g., 1 hour), explain the entity key (card_id, device_id), and warn against using future data. Sample answer: 'I'd define velocity as the count of distinct transactions per card_id in a sliding 1-hour window. The key is using point-in-time joins during training to ensure the window only contains data from before the target transaction timestamp. A common pitfall is using global aggregations, which leak future information.'
Answer Strategy
Tests debugging skills and understanding of data drift. Strategy: Walk through a systematic diagnosis: 1) Check for data pipeline failures. 2) Analyze feature distributions for drift (e.g., browser versions changing). 3) Evaluate feature importance shift. Sample answer: 'First, I'd validate the incoming data feed for schema changes or missing components. Then, I'd run a drift analysis (PSI/KL-divergence) on key fingerprint features like userAgent and installedFonts. If drift is found, I'd investigate upstream data collection changes (e.g., a new browser privacy mode altering the fingerprint). The fix would involve re-training the model on recent data or engineering more robust, privacy-resistant fingerprint features.'
1 career found
Try a different search term.