AI Phishing Detection Specialist
An AI Phishing Detection Specialist designs, trains, and deploys machine learning and NLP-based systems that identify phishing ema…
Skill Guide
The engineering discipline of packaging, optimizing, deploying, and maintaining machine learning models as high-throughput, low-latency services within automated, version-controlled pipelines to meet strict real-time performance SLAs.
Scenario
You have a ResNet-50 model from TensorFlow Hub. You must deploy it as a REST API that responds to single-image classification requests in under 100ms (p95) on CPU.
Scenario
A retail company wants to update its product recommendation model daily with new clickstream data, automatically validate its accuracy, and deploy it with zero downtime to a Kubernetes cluster.
Scenario
A fintech firm needs to run multiple complex models (gradient boosted trees + LSTM) on streaming market data to generate trade signals within a total latency budget of 10ms. The system must handle 50k QPS, minimize cloud cost, and be resilient to regional outages.
Core model servers for high-performance inference. TensorRT is critical for NVIDIA GPU optimization. KServe is the standard for Kubernetes-native serving. Cloud platforms provide fully managed endpoints for rapid deployment.
Tools for automating ML workflows (training, validation, deployment). MLflow for experiment tracking and model registry. Prometheus/Grafana for infrastructure monitoring. Evidently for data drift and model performance monitoring.
Containerization and orchestration for reproducible, scalable deployments. gRPC for low-latency communication. ONNX for model interoperability. For ultimate performance, custom kernels and NVIDIA's Triton server for advanced batching and model composition.
Answer Strategy
Use a structured debugging framework: 1) **Isolate the Problem**: Check if the spike correlates with the deployment (canary vs. rollout). Use monitoring to see if it's a system resource issue (CPU/GPU saturation, memory) or a model-specific issue (increased input size, failed feature lookup). 2) **Profile the Code**: Use a profiler (e.g., PyTorch Profiler, TensorFlow Profiler) on a sampled request in a staging environment. 3) **Check the Data Pipeline**: Validate if feature store latency increased. 4) **Mitigate & Fix**: If caused by the new model, rollback immediately. Then, investigate optimization (e.g., quantization, caching). Sample Answer: 'I would first check the deployment timeline and monitoring dashboards to correlate the latency spike with the new rollout. If it's isolated, I'd immediately roll back to the stable version. Concurrently, I'd profile a representative request in staging to identify the bottleneck-whether it's data loading, feature transformation, or the model inference itself. Common causes I'd look for are increased input tensor size, a missing feature cache, or a sub-optimal model graph.'
Answer Strategy
Tests technical judgment and business acumen. The answer must frame the trade-off in terms of business impact. Use the STAR method. Sample Answer: 'In a fraud detection system, our most accurate model (XGBoost ensemble) had a 95th percentile latency of 200ms, but our SLA was 100ms for checkout authorization. I analyzed the cost of a 100ms delay: a 1% increase in cart abandonment. I prototyped a two-stage system: a lightweight logistic regression model (<10ms) would filter 90% of clearly legitimate transactions, while the full ensemble only ran on the remaining 10% suspicious ones. This kept the overall p95 latency under 80ms with only a 0.2% drop in fraud catch rate, directly preserving conversion revenue while meeting the SLA.'
1 career found
Try a different search term.