AI Red Team Specialist
AI Red Team Specialists systematically probe, attack, and stress-test AI systems-especially large language models-to uncover vulne…
Skill Guide
Adversarial machine learning is the study of attack vectors and defensive techniques targeting the training, inference, and data lifecycle of ML models, encompassing evasion, extraction, poisoning, and inference attacks.
Scenario
A pretrained image classifier (e.g., ResNet-18 on CIFAR-10) is vulnerable. You must demonstrate an untargeted evasion attack.
Scenario
You have limited access to a training pipeline for a sentiment analysis model. Your goal is to corrupt its performance via backdoor injection.
Scenario
You have black-box query access to a proprietary ML-as-a-Service API. The goal is to steal a functionally equivalent model for a sensitive task (e.g., fraud detection).
CleverHans and Foolbox are Python libraries for benchmarking adversarial robustness and implementing attacks/defenses. Advertorch focuses on PyTorch. Microsoft Counterfit is a CLI tool for assessing ML model security. NVIDIA AIRT (AI Robustness Toolkit) is for production-grade robustness testing.
STRIDE (Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege) provides a structured approach to threat modeling for ML. MITRE ATLAS and the Adversarial ML Threat Matrix offer curated knowledge bases of adversary tactics, techniques, and procedures (TTPs) specific to ML systems.
Answer Strategy
Define the terms precisely based on attacker knowledge (model architecture, parameters, gradients vs. only API access). Contrast techniques (e.g., PGD for white-box vs. transfer attacks or query-based for black-box). Sample Answer: 'White-box attacks assume full knowledge of the model and use direct gradient computation, like PGD, for highly effective perturbations. Black-box attacks rely only on output feedback, using methods like transfer attacks from a substitute model or gradient estimation via queries. For a defender, this implies that securing against white-box attacks (via robust training) is necessary but insufficient; you must also monitor for anomalous query patterns and employ ensemble defenses to break transferability.'
Answer Strategy
Tests the ability to operationalize security into the MLOps lifecycle. The candidate should outline a phased approach: threat modeling, controlled testing, and runtime monitoring. Sample Answer: 'First, we'd perform a threat model specific to the drone's mission-e.g., targeted misclassification of stop signs is high-risk. Second, we'd implement a rigorous testing suite: benchmarking against standard evasion attacks (PGD, CW), simulating physical-world attacks in simulation, and conducting data poisoning tests on the training pipeline. Third, we'd instrument the production model with input monitoring (for out-of-distribution detection) and an ensemble disagreement system as a runtime defense. Finally, we'd establish a patching and retraining protocol triggered by new attack research.'
1 career found
Try a different search term.