Skip to main content

Interview Prep

AI Threat Hunting Specialist Interview Questions

47 expert questions covering beginner fundamentals to advanced AI workflow scenarios. Each answer includes a hint for structured responses.

Beginner: 5Intermediate: 9Advanced: 9Scenario-Based: 9AI Workflow & Tools: 10Behavioral: 5

Beginner

5 questions
What a great answer covers:

A strong answer contrasts a code flaw (e.g., buffer overflow) with a failure in the model's learned logic or data (e.g., adversarial example).

What a great answer covers:

The answer should define it as deliberately inserting malicious data into a training set to alter the model's behavior after training.

What a great answer covers:

It's a standard awareness document for LLM application security risks. Its importance lies in providing a common language and focus for developers and hunters.

What a great answer covers:

Reasons include intellectual property theft, bypassing query costs, or enabling easier crafting of adversarial examples against a local copy.

What a great answer covers:

API gateway logs for inference endpoints and application logs from the model-serving framework (e.g., TensorFlow Serving) are key examples.

Intermediate

9 questions
What a great answer covers:

Should include analyzing input feature distributions, checking for high-frequency noise patterns, looking for prediction flips on similar images, and querying for sudden confidence drops.

What a great answer covers:

Answer should cover identifying assets (model, data, user chats), potential threat actors, attack surfaces (input prompts, plugins, tools), and failure modes (hallucination, data leakage).

What a great answer covers:

Prompt injection is manipulating an LLM's input to override system instructions or exfiltrate data. Jailbreaking is a specific subset focused on bypassing safety filters to generate prohibited content.

What a great answer covers:

It uses the model's output to reconstruct inputs or attributes of training data, potentially revealing private information like faces or medical records.

What a great answer covers:

Answer should mention expanded attack surface, potential for insecure plugin APIs, risk of untrusted code execution, and data leakage through tool outputs.

What a great answer covers:

Methods include testing for misclassifications on known-clean samples, analyzing internal representations (neural cleanse), or using spectral signatures in the feature space.

What a great answer covers:

It's a library for creating adversarial examples, hardening models, and conducting certified defenses. A use case is generating PGD attacks to evaluate a model's robustness before deployment.

What a great answer covers:

It's techniques to extract memorized training data from the model, posing a direct privacy risk if the data is sensitive (PII, medical records).

What a great answer covers:

Sudden shifts in input distribution (covariate shift) can indicate either benign operational changes or a coordinated adversarial attack (e.g., a poisoning campaign).

Advanced

9 questions
What a great answer covers:

Plan should include: 1) Analyzing retrieval logs for anomalous queries, 2) Examining the vector store for sensitive embedded documents, 3) Testing the model for knowledge extraction beyond its intended scope, 4) Reviewing access controls on the RAG pipeline.

What a great answer covers:

Key elements: deploy a model with intentional, subtle vulnerabilities, instrument it with extensive logging, create fake 'valuable' endpoints, and use deception to guide attacker interaction.

What a great answer covers:

Traditional IoCs (IPs, hashes) are poor for AI attacks. Better alternatives: Indicators of Attack (IoA) based on behavior (e.g., input pattern sequences), model-specific anomalies, and TTPs mapped to frameworks like MITRE ATLAS.

What a great answer covers:

Beyond IP theft, it allows offline adversarial example generation, facilitates model inversion attacks, and can be a step in a larger chain to compromise downstream systems that trust the model.

What a great answer covers:

They allow computation on encrypted data, protecting model IP and user data. Trade-offs are massive performance overhead and complexity, limiting their use to specific, high-value scenarios.

What a great answer covers:

It's the potential performance decrease from aligning an AI with human values. Attackers might exploit this by forcing the model into a 'alignment negotiation' to bypass safety measures under the guise of achieving a benign goal.

What a great answer covers:

Example: 1) Poison fine-tuning data to create a 'sleeper' behavior trigger. 2) Use prompt injection to steer the agent into a scenario where the trigger activates. 3) The activated behavior then performs malicious actions while evading standard safety monitors.

What a great answer covers:

Must systematically evaluate each modality's input channel, the fusion layer, shared representations, and output handling. Each presents unique attack vectors (e.g., adversarial audio, image steganography).

What a great answer covers:

Differential privacy adds noise, degrading model utility. It doesn't fully prevent all inference attacks. Combine with access control, output perturbation, and monitoring for suspicious query patterns.

Scenario-Based

9 questions
What a great answer covers:

Hypothesis: The model is processing adversarial examples designed to be computationally expensive (e.g., via careful perturbation). Investigate by sampling and analyzing the recent inputs for anomalous patterns.

What a great answer covers:

Explain that this demonstrates the ability to manipulate the model's output against its alignment, which is the same primitive used for harmful content generation, data exfiltration, or sabotaging business logic.

What a great answer covers:

Look for: 1) Unusual API access patterns pre-launch, 2) Model behavior fingerprinting (asking it the same obscure questions), 3) Analyzing the competitor's product for known quirks/bugs in your model, 4) Checking for leaks in code repositories or cloud storage.

What a great answer covers:

Assess: 1) Does it contain company data or PII? 2) Is it fine-tuned for sensitive tasks? 3) Could it be a phishing vector for developers? 4) Does it reveal proprietary prompting techniques?

What a great answer covers:

Steps: 1) Isolate the system to prevent over-blocking. 2) Check for recent model updates or data pipeline changes. 3) Analyze a sample of flagged vs. unflagged content. 4) Look for external triggers (e.g., a new slang term) or a coordinated poisoning attack on the feedback loop.

What a great answer covers:

Vectors: 1) Prompt injection to hijack agent goals, 2) Malicious code in retrieved context (RAG), 3) Exploiting tool APIs (e.g., command injection via a shell tool), 4) Causing the agent to write vulnerable code that gets executed.

What a great answer covers:

Hunt would focus on: 1) Network traffic to common AI API endpoints, 2) DNS queries for AI service domains, 3) Analyzing DLP (Data Loss Prevention) alerts for sensitive data patterns, 4) Endpoint monitoring for local AI tool installations.

What a great answer covers:

Causes: 1) Data drift / covariate shift between training and production data, 2) The test set is not representative (shortcut learning), 3) The model was evaluated on the same data it was trained on (data leakage), 4) A subtle adversarial attack is occurring in production.

What a great answer covers:

Analyze the feedback data for patterns: 1) Coordinated voting from similar IP ranges/behavior, 2) Feedback that consistently pushes the model toward a desired (but incorrect) output, 3) Correlation between unusual model updates and feedback spikes.

AI Workflow & Tools

10 questions
What a great answer covers:

Describe a workflow: 1) Define attack prompt list, 2) Write a script that initializes the agent, 3) Loop through prompts, sending each as input, 4) Capture and log the full response (including tool calls), 5) Compare output against expected safe behavior or known injection signatures.

What a great answer covers:

Workflow: 1) Load model and tokenizer, 2) Prepare a benchmark dataset with protected attributes (e.g., gender, ethnicity), 3) Run predictions, 4) Use 'evaluate' to compute fairness metrics (e.g., demographic parity difference), 5) Analyze slices where bias is highest to identify exploitable skews.

What a great answer covers:

Pipeline: 1) Use 'fickling' or custom code to safely de-serialize and inspect the file structure without executing it, 2) Check for embedded code execution vectors (e.g., __reduce__), 3) Scan for known malicious model fingerprints, 4) If safe, load the model in a sandbox and run inference tests for behavioral anomalies.

What a great answer covers:

Steps: 1) Docker-compose with containers for the LLM serving framework (vLLM, TGI), a vector database (for RAG), and a mock tool server, 2) Load a quantized open-weight model, 3) Mount a clean RAG knowledge base, 4) Expose a local API endpoint, 5) Network the containers so you can attack the full chain.

What a great answer covers:

Steps: 1) Enable Model Monitor on the endpoint, 2) Define a baseline from clean training data, 3) Schedule monitoring jobs to compare production data statistics against baseline, 4) Set up CloudWatch alarms for significant statistical drift, 5) Investigate alerts by correlating with recent data batches.

What a great answer covers:

Pseudo-code should show: importing ART's PGD attack, creating the classifier wrapper, defining attack params (eps=0.03, max_iter=10), generating adversarial examples from a batch of test images, and visualizing the original vs. adversarial image.

What a great answer covers:

Workflow: 1) Install Garak, 2) Point it at the LLM's API, 3) Run specific probe modules (e.g., `garak -m the_model -p promptinject, data`), 4) Analyze the report for detected vulnerabilities and successful exploit generations.

What a great answer covers:

Fingerprint method: 1) Design a set of unique, non-English, or synthetically created prompts with expected outputs, 2) Query the original model to record these unique responses, 3) To test a suspected clone, send the same fingerprint prompts and compare the responses.

What a great answer covers:

Integration: 1) Use ATLAS techniques as a checklist for hunt hypotheses (e.g., 'Hunt for T1552.001 - LLM Jailbreak'), 2) Map detected activity to specific ATLAS techniques in reports, 3) Use the matrix to ensure coverage of all attack surfaces, 4) Use it to communicate risk to non-technical stakeholders.

What a great answer covers:

Logic: Query for users/IPs with: 1) High volume of queries in a short time, 2) Systematic variation of input features (e.g., querying all variations of a feature), 3) Queries focused on model decision boundaries, 4) Low-confidence predictions that might indicate probing.

Behavioral

5 questions
What a great answer covers:

Look for use of analogy, focusing on business impact (risk, revenue, reputation), avoiding jargon, and confirming understanding through questions.

What a great answer covers:

A good answer demonstrates persistence, creative thinking, and a methodical approach (e.g., looking at the problem from a different angle, questioning assumptions, combining disparate information).

What a great answer covers:

Look for: reading arxiv papers, following specific researchers on Twitter/X, participating in communities (like MLSec, OWASP), contributing to open-source security tools, attending conferences (DEF CON AI Village), and hands-on experimentation.

What a great answer covers:

A strong answer balances risk and urgency: immediately escalate with clear severity, propose mitigations if full fix isn't possible (e.g., rate limiting, input filtering), document the decision process, and prepare rollback plans.

What a great answer covers:

Key themes: strict adherence to scope and rules of engagement, minimizing disruption, data privacy, thorough documentation, and ensuring findings lead to improved security, not just a report.