AI Threat Hunting Specialist
The AI Threat Hunting Specialist proactively seeks out vulnerabilities, adversarial attacks, and misuse patterns within AI and ML …
Skill Guide
Red Teaming AI Systems is the structured adversarial process of probing, stress-testing, and attacking AI models and their pipelines to uncover failures, biases, and security vulnerabilities before they cause real-world harm.
Scenario
You are given access to a commercial LLM-powered customer service chatbot API. Your goal is to force it to reveal its system prompt or bypass its safety filters.
Scenario
A pre-trained image classification model (e.g., ResNet) is used in a simulated access control system. You must cause it to misclassify objects with minimal, imperceptible perturbations.
Scenario
An organization uses a third-party pre-trained model and public datasets for a credit scoring AI. Simulate a scenario where a malicious actor has poisoned the upstream supply chain.
Counterfit and ART are open-source libraries for running standardized adversarial attacks (e.g., PGD, Carlini-Wagner) against ML models. Garak is a tool specifically for probing LLMs for weaknesses. These are used to automate vulnerability scanning during development and pre-deployment testing.
ATLAS and OWASP ML Top 10 provide standardized taxonomies of adversarial tactics and common vulnerabilities, structuring the red team's attack playbook. NIST AI RMF and FAIR help translate technical findings into business risk language for executive communication and prioritization.
Essential for creating isolated, reproducible, and safe environments to conduct destructive testing without impacting production systems. Enables systematic versioning of attacked and patched models.
Answer Strategy
Use a structured threat modeling approach. Sample Answer: "First, I'd define the scope and rules of engagement, focusing on high-impact risks: data exfiltration via generated code, malicious code injection, and abuse of internal system access. I'd then build a threat matrix based on MITRE ATLAS, prioritizing tactics like Prompt Injection and Model Theft. The engagement would have three phases: 1) Reconnaissance to map the model's API and behavior, 2) Adversarial Attack Execution using tools like Garak for automated scanning and manual red teaming for creative scenarios, and 3) Analysis, where we classify findings by severity using CVSS-like scoring for AI and produce actionable mitigations for the MLOps team."
Answer Strategy
Tests risk assessment, communication, and pragmatic problem-solving under pressure. Sample Answer: "My immediate action is to escalate with clear data. I would prepare a concise brief for the product lead and legal counsel, quantifying the bias (e.g., 'Model has 15% higher false negative rate for Group X') and outlining the concrete legal and reputational risk. Simultaneously, I would explore immediate mitigations with the ML engineers, such as applying a fairness-aware post-processing threshold or implementing a human-in-the-loop review for the affected demographic. The goal is to enable an informed business decision-either delaying the launch for a fix or deploying with a known, documented, and monitored risk with an immediate remediation plan."
1 career found
Try a different search term.