Interview Prep
AI Threat Hunting Specialist Interview Questions
47 expert questions covering beginner fundamentals to advanced AI workflow scenarios. Each answer includes a hint for structured responses.
Beginner
5 questionsA strong answer contrasts a code flaw (e.g., buffer overflow) with a failure in the model's learned logic or data (e.g., adversarial example).
The answer should define it as deliberately inserting malicious data into a training set to alter the model's behavior after training.
It's a standard awareness document for LLM application security risks. Its importance lies in providing a common language and focus for developers and hunters.
Reasons include intellectual property theft, bypassing query costs, or enabling easier crafting of adversarial examples against a local copy.
API gateway logs for inference endpoints and application logs from the model-serving framework (e.g., TensorFlow Serving) are key examples.
Intermediate
9 questionsShould include analyzing input feature distributions, checking for high-frequency noise patterns, looking for prediction flips on similar images, and querying for sudden confidence drops.
Answer should cover identifying assets (model, data, user chats), potential threat actors, attack surfaces (input prompts, plugins, tools), and failure modes (hallucination, data leakage).
Prompt injection is manipulating an LLM's input to override system instructions or exfiltrate data. Jailbreaking is a specific subset focused on bypassing safety filters to generate prohibited content.
It uses the model's output to reconstruct inputs or attributes of training data, potentially revealing private information like faces or medical records.
Answer should mention expanded attack surface, potential for insecure plugin APIs, risk of untrusted code execution, and data leakage through tool outputs.
Methods include testing for misclassifications on known-clean samples, analyzing internal representations (neural cleanse), or using spectral signatures in the feature space.
It's a library for creating adversarial examples, hardening models, and conducting certified defenses. A use case is generating PGD attacks to evaluate a model's robustness before deployment.
It's techniques to extract memorized training data from the model, posing a direct privacy risk if the data is sensitive (PII, medical records).
Sudden shifts in input distribution (covariate shift) can indicate either benign operational changes or a coordinated adversarial attack (e.g., a poisoning campaign).
Advanced
9 questionsPlan should include: 1) Analyzing retrieval logs for anomalous queries, 2) Examining the vector store for sensitive embedded documents, 3) Testing the model for knowledge extraction beyond its intended scope, 4) Reviewing access controls on the RAG pipeline.
Key elements: deploy a model with intentional, subtle vulnerabilities, instrument it with extensive logging, create fake 'valuable' endpoints, and use deception to guide attacker interaction.
Traditional IoCs (IPs, hashes) are poor for AI attacks. Better alternatives: Indicators of Attack (IoA) based on behavior (e.g., input pattern sequences), model-specific anomalies, and TTPs mapped to frameworks like MITRE ATLAS.
Beyond IP theft, it allows offline adversarial example generation, facilitates model inversion attacks, and can be a step in a larger chain to compromise downstream systems that trust the model.
They allow computation on encrypted data, protecting model IP and user data. Trade-offs are massive performance overhead and complexity, limiting their use to specific, high-value scenarios.
It's the potential performance decrease from aligning an AI with human values. Attackers might exploit this by forcing the model into a 'alignment negotiation' to bypass safety measures under the guise of achieving a benign goal.
Example: 1) Poison fine-tuning data to create a 'sleeper' behavior trigger. 2) Use prompt injection to steer the agent into a scenario where the trigger activates. 3) The activated behavior then performs malicious actions while evading standard safety monitors.
Must systematically evaluate each modality's input channel, the fusion layer, shared representations, and output handling. Each presents unique attack vectors (e.g., adversarial audio, image steganography).
Differential privacy adds noise, degrading model utility. It doesn't fully prevent all inference attacks. Combine with access control, output perturbation, and monitoring for suspicious query patterns.
Scenario-Based
9 questionsHypothesis: The model is processing adversarial examples designed to be computationally expensive (e.g., via careful perturbation). Investigate by sampling and analyzing the recent inputs for anomalous patterns.
Explain that this demonstrates the ability to manipulate the model's output against its alignment, which is the same primitive used for harmful content generation, data exfiltration, or sabotaging business logic.
Look for: 1) Unusual API access patterns pre-launch, 2) Model behavior fingerprinting (asking it the same obscure questions), 3) Analyzing the competitor's product for known quirks/bugs in your model, 4) Checking for leaks in code repositories or cloud storage.
Assess: 1) Does it contain company data or PII? 2) Is it fine-tuned for sensitive tasks? 3) Could it be a phishing vector for developers? 4) Does it reveal proprietary prompting techniques?
Steps: 1) Isolate the system to prevent over-blocking. 2) Check for recent model updates or data pipeline changes. 3) Analyze a sample of flagged vs. unflagged content. 4) Look for external triggers (e.g., a new slang term) or a coordinated poisoning attack on the feedback loop.
Vectors: 1) Prompt injection to hijack agent goals, 2) Malicious code in retrieved context (RAG), 3) Exploiting tool APIs (e.g., command injection via a shell tool), 4) Causing the agent to write vulnerable code that gets executed.
Hunt would focus on: 1) Network traffic to common AI API endpoints, 2) DNS queries for AI service domains, 3) Analyzing DLP (Data Loss Prevention) alerts for sensitive data patterns, 4) Endpoint monitoring for local AI tool installations.
Causes: 1) Data drift / covariate shift between training and production data, 2) The test set is not representative (shortcut learning), 3) The model was evaluated on the same data it was trained on (data leakage), 4) A subtle adversarial attack is occurring in production.
Analyze the feedback data for patterns: 1) Coordinated voting from similar IP ranges/behavior, 2) Feedback that consistently pushes the model toward a desired (but incorrect) output, 3) Correlation between unusual model updates and feedback spikes.
AI Workflow & Tools
10 questionsDescribe a workflow: 1) Define attack prompt list, 2) Write a script that initializes the agent, 3) Loop through prompts, sending each as input, 4) Capture and log the full response (including tool calls), 5) Compare output against expected safe behavior or known injection signatures.
Workflow: 1) Load model and tokenizer, 2) Prepare a benchmark dataset with protected attributes (e.g., gender, ethnicity), 3) Run predictions, 4) Use 'evaluate' to compute fairness metrics (e.g., demographic parity difference), 5) Analyze slices where bias is highest to identify exploitable skews.
Pipeline: 1) Use 'fickling' or custom code to safely de-serialize and inspect the file structure without executing it, 2) Check for embedded code execution vectors (e.g., __reduce__), 3) Scan for known malicious model fingerprints, 4) If safe, load the model in a sandbox and run inference tests for behavioral anomalies.
Steps: 1) Docker-compose with containers for the LLM serving framework (vLLM, TGI), a vector database (for RAG), and a mock tool server, 2) Load a quantized open-weight model, 3) Mount a clean RAG knowledge base, 4) Expose a local API endpoint, 5) Network the containers so you can attack the full chain.
Steps: 1) Enable Model Monitor on the endpoint, 2) Define a baseline from clean training data, 3) Schedule monitoring jobs to compare production data statistics against baseline, 4) Set up CloudWatch alarms for significant statistical drift, 5) Investigate alerts by correlating with recent data batches.
Pseudo-code should show: importing ART's PGD attack, creating the classifier wrapper, defining attack params (eps=0.03, max_iter=10), generating adversarial examples from a batch of test images, and visualizing the original vs. adversarial image.
Workflow: 1) Install Garak, 2) Point it at the LLM's API, 3) Run specific probe modules (e.g., `garak -m the_model -p promptinject, data`), 4) Analyze the report for detected vulnerabilities and successful exploit generations.
Fingerprint method: 1) Design a set of unique, non-English, or synthetically created prompts with expected outputs, 2) Query the original model to record these unique responses, 3) To test a suspected clone, send the same fingerprint prompts and compare the responses.
Integration: 1) Use ATLAS techniques as a checklist for hunt hypotheses (e.g., 'Hunt for T1552.001 - LLM Jailbreak'), 2) Map detected activity to specific ATLAS techniques in reports, 3) Use the matrix to ensure coverage of all attack surfaces, 4) Use it to communicate risk to non-technical stakeholders.
Logic: Query for users/IPs with: 1) High volume of queries in a short time, 2) Systematic variation of input features (e.g., querying all variations of a feature), 3) Queries focused on model decision boundaries, 4) Low-confidence predictions that might indicate probing.
Behavioral
5 questionsLook for use of analogy, focusing on business impact (risk, revenue, reputation), avoiding jargon, and confirming understanding through questions.
A good answer demonstrates persistence, creative thinking, and a methodical approach (e.g., looking at the problem from a different angle, questioning assumptions, combining disparate information).
Look for: reading arxiv papers, following specific researchers on Twitter/X, participating in communities (like MLSec, OWASP), contributing to open-source security tools, attending conferences (DEF CON AI Village), and hands-on experimentation.
A strong answer balances risk and urgency: immediately escalate with clear severity, propose mitigations if full fix isn't possible (e.g., rate limiting, input filtering), document the decision process, and prepare rollback plans.
Key themes: strict adherence to scope and rules of engagement, minimizing disruption, data privacy, thorough documentation, and ensuring findings lead to improved security, not just a report.