AI Red Team Engineer
An AI Red Team Engineer systematically probes, attacks, and stress-tests AI systems-especially large language models-to uncover vu…
Skill Guide
The adversarial machine learning discipline of intentionally corrupting training data or implanting hidden triggers in models to cause specific misclassifications, alongside the methods to detect and mitigate such threats.
Scenario
You are given a clean image classification dataset (e.g., MNIST). Your goal is to flip a percentage of labels to a target class and then detect the poisoned subset.
Scenario
You must implant a visible trigger pattern (e.g., a small square patch) into a subset of training images for a target class, creating a backdoored model. Then, develop a defense to identify the trigger.
Scenario
Design and evaluate a federated learning system where multiple clients collaboratively train a model without sharing data, but one or more clients are malicious and attempt to inject a backdoor via model updates.
Use PyTorch/TensorFlow for model training and attack implementation. Foolbox provides standardized adversarial attacks. TFF/PySyft are essential for simulating and securing federated learning environments.
BackdoorBench offers standardized datasets, attacks, and defenses for backdoor research. ART is an industry-grade library for attack and defense methods across vision, NLP, and time-series. TextAttack extends poisoning concepts to text data.
These are specific algorithmic implementations. Neural Cleanse identifies potential triggers by optimization. STRIP detects backdoors by observing prediction entropy under input perturbations. Activation Clustering analyzes internal representations to separate clean and poisoned data.
Answer Strategy
The interviewer is testing for depth of attack creativity and proactive defense thinking. A strong answer details a targeted, sparse attack (e.g., carefully altering feature values for a small subset to invert the churn label) and focuses on detection via statistical tests on feature-label correlations, monitoring prediction stability across subpopulations, or analyzing data provenance for anomalies, not just overall accuracy drift.
Answer Strategy
This tests the candidate's structured forensic methodology. The answer should outline a triage process: 1) Reproduce the failure and characterize the inputs (look for consistent visual triggers). 2) Run backdoor-specific diagnostics (Neural Cleanse, STRIP) on the model and suspect inputs. 3) Analyze training data lineage for the failing class. 4) Use activation visualization to see if a unique internal pathway is activated for failures. The response must contrast this with drift analysis techniques (statistical tests on feature distributions over time).
1 career found
Try a different search term.