Interview Prep
AI Security Code Review Specialist Interview Questions
50 expert questions covering beginner fundamentals to advanced AI workflow scenarios. Each answer includes a hint for structured responses.
Beginner
5 questionsA strong answer distinguishes direct vs. indirect prompt injection, explains how untrusted input can override system prompts, and gives a concrete example of data exfiltration or unauthorized action.
Answer should list key categories like LLM01 Prompt Injection, LLM05 Insecure Output Handling, and LLM06 Sensitive Information Disclosure, with justification based on exploitability and blast radius.
Should define static vs. dynamic analysis, then explain how SAST catches insecure model loading in source code while DAST tests the live model endpoint for injection and information leakage.
Answer covers arbitrary code execution via pickle's __reduce__ method, the recommendation to use safetensors format, and how supply-chain attacks exploit this in model distribution.
Should explain that ATLAS focuses on adversarial tactics specific to ML systems - model theft, data poisoning, evasion attacks - whereas ATT&CK targets traditional IT infrastructure and endpoints.
Intermediate
10 questionsA thorough answer covers: embedding data leakage through over-permissive Pinecone API keys, prompt injection via retrieved documents, missing output sanitization, insecure tool-calling chains, and lack of namespace isolation between tenants.
Answer should show pattern matching on pickle.load or pickle.loads, flag the risk, recommend safetensors or torch.load with weights_only=True, and explain how to integrate the rule into pre-commit hooks.
Strong answer describes injecting adversarial instructions into external data sources that the LLM later processes, and outlines a testing methodology including canary tokens and instruction-following verification.
Should cover authentication and authorization, rate limiting, input validation and token limits, output content filtering, logging and anomaly detection, and TLS encryption.
Answer covers checking model card metadata, verifying author reputation, scanning for malicious code in custom model scripts, validating weights with safetensors, and running dependency audits on the model's requirements.
Should explain that LLM outputs are treated as trusted data downstream - leading to XSS, SSRF, or SQL injection when the output is rendered in a browser, used in an API call, or passed to a database query without sanitization.
Answer maps Spoofing (fake tool responses), Tampering (prompt injection altering behavior), Repudiation (lack of audit logs for AI actions), Information Disclosure (model leaking training data), Denial of Service (token exhaustion), and Elevation of Privilege (tool-calling chain abuse).
Strong answer covers: access controls on training data and model registry, secret management for API keys, scanning training data for adversarial samples, reproducible builds, gated deployment with canary evaluation, and rollback mechanisms.
Should cover cross-tenant data leakage through similarity search, namespace isolation, access control at the API key level, and the risk of membership inference attacks through embedding queries.
Answer defines extraction (replicating model behavior via API queries) vs. inversion (reconstructing training data), and discusses detection through query pattern analysis, rate limiting, and watermarking.
Advanced
10 questionsA complete answer traces the prompt injection to arbitrary command execution path, assesses blast radius (RCE on the server), proposes sandboxing with Docker/Firecracker, principle of least privilege for tool functions, and input validation on tool parameters.
Should outline automated prompt injection testing, jailbreak detection, data leakage probes, tool-calling abuse scenarios, multi-turn manipulation, and integration with CI/CD for continuous adversarial testing - referencing tools like Garak or PyRIT.
Strong answer covers PII handling and anonymization before LLM processing, strict output schema validation, human-in-the-loop for high-stakes decisions, audit logging, prompt hardening against bias manipulation, and regulatory compliance (ECOA, FCRA).
Answer covers parameter injection in tool calls, privilege escalation through chained tool use, lack of user confirmation for destructive actions, and recommends reviewing tool permission boundaries, parameter validation, and implementing a tool execution sandbox.
Should identify inter-agent prompt injection, cascade failures through trust propagation, unauthorized delegation, lack of agent identity verification, and recommend structured communication protocols with validation at each boundary.
Answer covers training data poisoning through adversarial web content, data provenance verification, content filtering and deduplication, toxicity and bias detection, and differential privacy or data sanitization before training.
Strong answer discusses adapter isolation, namespace-based access control for adapters, verifying adapter integrity through checksums, and risks of adapter injection attacks in shared inference environments.
Should propose an AI-specific risk scoring methodology that factors in model autonomy level, data sensitivity, blast radius of tool-calling abuse, and likelihood of adversarial exploitation - potentially extending CVSS with AI-specific temporal and environmental metrics.
Answer covers unauthorized use of AI tools and APIs by employees, data leakage through free-tier LLM APIs, need for AI usage inventory, approved tool catalogs, DLP integration, and periodic security reviews of AI tool usage patterns.
Should outline dimensions like governance, threat modeling, secure development lifecycle, testing automation, incident response, and compliance - with maturity levels from ad-hoc to optimized, specific to AI systems.
Scenario-Based
10 questionsReview should flag: lack of path validation and sandboxing, risk of prompt injection leading to arbitrary file read (e.g., /etc/passwd, .env secrets), need for allowlisted directories, and requirement for audit logging of all file access.
Should rate this as high severity - system prompt leakage through prompt injection can expose internal architecture, recommend externalizing sensitive configuration, and propose output filtering to prevent system prompt disclosure.
Answer covers: assess regulatory exposure (GDPR, CCPA), evaluate model memorization risks, recommend retraining with PII anonymized or differential privacy, implement output filtering, and establish data governance for future training pipelines.
This is LLM-enabled SQL injection - the model generates the query and the application executes it without sanitization. Remediation includes using parameterized queries, output validation with a SQL parser, and separating the query generation from execution with strict allowlisting.
Classify as LLM06 Sensitive Information Disclosure / prompt extraction. Prioritize based on what the system prompt contains - if it includes API keys, internal endpoints, or business logic, it's critical. Recommend prompt hardening, output filtering, and treat system prompt as a secret.
Should identify: cross-contamination of data sources in retrieval, patient data leakage through the LLM response, HIPAA compliance requirements, need for namespace isolation in the vector store, and access control on which documents can be retrieved based on user role.
Covers supply-chain risk - model weights could be tampered with if HuggingFace is compromised or if a typosquatting attack targets the model identifier. Fix includes pinning model revisions, verifying checksums, using safetensors format, and restricting model sources to a private registry.
Should address: abuse potential without rate limiting (DoS, model extraction), liability from unfiltered outputs (harmful content generation), recommend tiered rate limiting per API key, output content moderation, usage monitoring, and terms of service with acceptable use policies.
Rate as critical - arbitrary code execution through LLM-generated code is a direct RCE vector. Even with prompt hardening, LLM output is untrusted. Recommend sandboxed execution (Docker, Firecracker, WebAssembly), strict resource limits, network isolation, and output validation.
Classify as sensitive credential exposure (LLM06 + traditional secret leakage). Risk includes token theft from log aggregation systems. Remediate with log scrubbing/redaction, structured logging with secret masking, and rotate exposed tokens immediately.
AI Workflow & Tools
10 questionsAnswer should describe installing Garak, configuring the target as an API endpoint, selecting injection probes (dan, promptinject), running automated scan with reporting, and interpreting results including false positive management.
Should describe: Bandit for Python SAST, custom Semgrep rules for prompt template patterns, GitLeaks for secret scanning, dependency audit for LangChain and related packages, and optional dynamic testing against a staging endpoint using Garak.
Should explain using LangSmith's tracing to visualize the full execution path - prompt input, model reasoning, tool selection, tool parameters, and output - then reviewing for unexpected tool invocations, parameter injection, and privilege escalation patterns.
Answer covers configuring PyRIT targets, using multi-turn attack orchestrators, applying red teaming strategies like crescendo or many-shot jailbreaking, analyzing scorer outputs, and integrating results into a vulnerability report.
Should describe the CodeQL query language basics, how to model Python data flow from model loading functions to execution contexts, creating taint tracking queries, and integrating custom queries into GitHub Advanced Security.
Answer describes writing a taint-mode Semgrep rule that tracks data flow from OpenAI/LLM API responses to HTTP response handlers, flagging paths that skip the content filter function, with auto-fix suggestions.
Should cover: checking model card and author, scanning the model's Python scripts for malicious code, verifying file hashes, converting to safetensors if needed, running a basic inference test for unexpected behaviors, and checking the model's license and data provenance.
Answer covers configuring Trivy for filesystem and container image scanning, scanning for CVEs in Python dependencies and system packages, detecting misconfigurations in Dockerfile (running as root, exposed ports), and integrating into CI pipeline.
Should describe: configuring custom detector patterns for OpenAI, Anthropic, HuggingFace, and cloud provider API keys, setting up pre-commit hooks, running historical scans on git history, and integrating findings into a SIEM or ticketing system.
Answer should describe crafting a suite of system prompt extraction attempts, automating them against the application, analyzing responses for indicators of system prompt content (e.g., matching known system prompt fragments), and setting up continuous monitoring.
Behavioral
5 questionsStrong answer demonstrates systematic thinking, technical depth in the discovery process, clear risk communication to non-technical stakeholders, and a collaborative remediation approach.
Answer should show pragmatic risk-based prioritization, ability to distinguish critical from low-severity findings, and experience negotiating security requirements with engineering teams without being a bottleneck.
Should demonstrate self-directed learning ability, resourcefulness in using documentation and community resources, and ability to rapidly build enough expertise to identify non-obvious security issues.
Strong answer includes specific sources (AI Village, Simon Willison's blog, OWASP AI Project, arXiv security papers), participation in communities, hands-on experimentation with new attack techniques, and structured knowledge management.
Answer should show empathy for developer perspective, ability to articulate risk in business terms, use of evidence (exploit demos, industry precedents), and willingness to find pragmatic solutions that address the core risk.