Interview Prep
AI Attack Surface Analyst Interview Questions
50 expert questions covering beginner fundamentals to advanced AI workflow scenarios. Each answer includes a hint for structured responses.
Beginner
5 questionsA great answer covers model endpoints, prompt interfaces, training data pipelines, embedding stores, and plugin integrations as components that don't exist in traditional AppSec.
Should include prompt injection, insecure output handling, training data poisoning, model denial of service, supply-chain vulnerabilities, sensitive information disclosure, and insecure plugin design.
Should distinguish direct vs indirect injection, explain how natural-language instructions can override system prompts, and note the difficulty of deterministic prevention.
Should describe ATLAS as a knowledge base of adversary tactics and techniques targeting AI systems, analogous to ATT&CK for traditional cybersecurity, and explain its use in threat modeling.
Should describe a systematic inventory process: cataloging model endpoints, data sources, vector stores, agent workflows, third-party integrations, and access control surfaces.
Intermediate
10 questionsShould cover vector injection, retrieval poisoning, system prompt leakage through context, cross-tenant data exposure, and output handling risks in generated responses.
Should describe query-based extraction techniques, the role of API access patterns, membership inference as a precursor, and rate limiting / watermarking as defenses.
Should explain how malicious instructions can be embedded in external data sources (web pages, documents, emails) that an AI agent ingests, leading to unintended tool execution.
Should cover model provenance verification, license analysis, automated scanning for backdoors, testing against known adversarial benchmarks, and comparing outputs against a trusted baseline.
Should address tool-call manipulation, privilege escalation through prompt chaining, unauthorized API calls, data exfiltration via tool outputs, and the challenge of enforcing least-privilege in agentic systems.
Should cover targeted vs untargeted poisoning, data provenance tracking, anomaly detection in training distributions, influence functions, and differential testing across model versions.
Should describe integration of Garak or PyRIT into CI/CD, regression testing for known jailbreaks, automated prompt fuzzing, model output monitoring, and alerting thresholds.
Should cover adversarial embedding injection, cross-retrieval poisoning, dimension analysis for detecting anomalous vectors, and namespace isolation as a mitigation.
Should mention IAM policies, VPC endpoint restrictions, API key rotation, request/response logging, PII detection in prompts, rate limiting, and output content filtering.
Should cover membership inference attacks, canary token insertion, differential privacy evaluation, and output deduplication analysis against known training data.
Advanced
10 questionsShould cover scope definition, rule-of-engagement for high-risk actions, multi-step exploit chains targeting tool abuse, data exfiltration through report generation, and escalation procedures.
Should discuss trigger pattern detection, activation analysis, differential testing across input classes, model diff techniques, and contractual/technical controls like split learning or on-premise fine-tuning.
Should cover adversarial images bypassing content filters, audio prompt injection, cross-modal confusion attacks, steganographic data exfiltration, and model fusion vulnerabilities.
Should reference FAIR methodology adapted for AI, quantifying likelihood and impact of AI-specific incidents, comparing to breach cost data, and presenting a risk-adjusted remediation roadmap.
Should cover knowledge extraction during distillation, precision loss creating new adversarial vulnerabilities, merged model behavioral unpredictability, and provenance verification challenges.
Should address feedback poisoning, reward hacking, distribution shift creating new vulnerabilities, the tension between learning and safety guardrails, and data pipeline integrity.
Should describe original research practices, monitoring arXiv preprints, participating in AI Village / bug bounty programs, building custom fuzzing tools, and responsible disclosure workflows.
Should cover Byzantine attacks on gradient updates, model update poisoning, inference attacks during aggregation, and the tension between privacy guarantees and security verification.
Should describe a structured dependency audit, transitive dependency analysis, license compliance checking, behavioral baseline testing, provenance graph construction, and ongoing monitoring strategy.
Should cover input sanitization, system prompt hardening, PII detection and redaction layers, tool access controls with audit logging, output filtering, human-in-the-loop escalation, and continuous monitoring.
Scenario-Based
10 questionsShould cover extraction testing to confirm data leakage, analysis of the competitor's model for signs of model theft, legal coordination, technical evidence collection, and defensive hardening of your own models.
Should cover immediate containment (disable agent, revoke tool access), evidence preservation, severity assessment, stakeholder notification, root cause analysis initiation, and emergency patching.
Should outline a phased approach: asset inventory (day 1-2), threat model (day 2-3), adversarial testing (day 3-5), remediation guidance (day 6-7), and executive briefing with go/no-go recommendation.
Should describe inventorying all systems using the model, testing for trigger-activated behaviors, analyzing which production decisions were influenced by the model, rolling back to a verified model version, and implementing model provenance checks.
Should cover understanding the bypass technique, implementing linguistic diversity in test suites, adding language-agnostic safety layers, establishing manual review for edge cases, and updating the adversarial test corpus.
Should cover regulatory requirements mapping, AI asset classification, testing framework selection, automated pipeline design, reporting template creation, staff training plan, and integration with existing GRC processes.
Should address severity classification (indirect prompt injection leading to data exfiltration), immediate tool access restrictions, implementing input sanitization for email content, adding PII detection on agent outputs, and designing a defense-in-depth architecture.
Should cover vulnerability impact assessment on your specific deployment, temporary mitigation through input validation and output monitoring, forking and patching the dependency, vendor engagement and timeline expectations, and evaluating alternative integrations.
Should discuss access segmentation between internal and external users, output filtering for sensitive patterns, context-window restrictions for external-facing instances, and the broader challenge of knowledge boundaries in fine-tuned models.
Should cover adversarial image detection using defensive distillation or input preprocessing, output monitoring for system prompt fragments, image sanitization pipelines, and testing with known adversarial perturbation techniques.
AI Workflow & Tools
10 questionsShould describe Garak's probe-detector-generator architecture, selecting probes for prompt injection and encoding attacks, configuring detectors for successful exploitation, running against your target API, and interpreting the HTML/JSON report output.
Should describe PyRIT's orchestrator pattern, using multi-turn orchestrators for gradual escalation, target integration with Azure OpenAI or custom endpoints, scorer configuration for detecting successful jailbreaks, and result analysis.
Should cover instrumenting the agent with LangSmith tracing, analyzing tool-call sequences for privilege escalation patterns, reviewing intermediate reasoning for prompt leakage, and identifying data flow between tools.
Should describe artifact versioning, hash-based integrity verification, automated comparison of model behavior across versions, and integration with CI/CD gates for deployment approval.
Should cover Bedrock Guardrails configuration for content filtering and topic denial, CloudWatch logging of all prompt/response pairs, CloudTrail for API access auditing, and integration with SIEM for alerting.
Should describe CI trigger configuration, running Garak or custom test suites against staging endpoints, gating merges on security test results, artifact storage of test reports, and notification to the security team.
Should cover checking model card documentation, reviewing download stats and community reports, running evaluation on adversarial benchmarks, testing with Garak against known attack patterns, and verifying license compatibility.
Should describe defining eval data sets for domain-specific risks, configuring evaluation criteria for safety and correctness, running evaluations against different model versions, and tracking results over time for regression detection.
Should describe testing namespace isolation, querying with crafted embeddings to retrieve unauthorized documents, testing metadata filtering bypasses, and verifying access controls on collection creation and deletion.
Should describe configuring topical rails and jailbreak detection, then systematically testing bypass techniques including encoding tricks, role-play scenarios, multi-language attacks, and context manipulation.
Behavioral
5 questionsShould demonstrate systematic thinking, intellectual curiosity, and the ability to translate technical findings into risk language that non-technical stakeholders understand and act on.
Should describe structured information consumption habits, prioritization frameworks based on organizational relevance, and balancing academic research with practical threat intelligence.
Should demonstrate the ability to build a compelling risk case, propose pragmatic mitigations, collaborate rather than obstruct, and escalate appropriately when safety is at stake.
Should describe relationship-building through pair testing, providing actionable guidance rather than just findings, celebrating shared wins, and embedding security into development workflows rather than gatekeeping.
Should demonstrate a structured learning methodology, comfort with ambiguity, resourcefulness in finding the right information quickly, and the ability to apply new knowledge to produce actionable security findings.