Skip to main content

Skill Guide

Threat modeling for AI-specific attack vectors (prompt injection, data exfiltration via RAG, model extraction)

Threat modeling for AI-specific attack vectors is the systematic process of identifying, assessing, and prioritizing security risks unique to AI/ML systems-specifically prompt injection, data exfiltration via Retrieval-Augmented Generation (RAG), and model extraction attacks-to design proactive defenses.

This skill is critical because AI systems, especially those integrated with enterprise data via RAG, present novel attack surfaces that traditional security models miss. Mastering it prevents catastrophic data breaches, protects proprietary models, and ensures regulatory compliance, directly safeguarding revenue and trust.
1 Careers
1 Categories
9.1 Avg Demand
15% Avg AI Risk

How to Learn Threat modeling for AI-specific attack vectors (prompt injection, data exfiltration via RAG, model extraction)

1. Core Concepts: Understand the OWASP Top 10 for LLMs, the CIA triad in an AI context (Confidentiality, Integrity, Availability of models/data). 2. Attack Anatomy: Study the mechanics of direct/indirect prompt injection, how RAG pipelines can be weaponized for data theft, and model extraction via API querying. 3. Foundational Frameworks: Learn basic STRIDE adapted for AI systems (Spoofing of model outputs, Tampering with prompts).
Move to practice by mapping attack trees for a specific RAG application. Analyze public post-mortems of real-world AI security incidents. Common mistake: Focusing only on the model and ignoring the data pipeline and orchestration layer vulnerabilities. Practice threat modeling using Microsoft's AI-specific STRIDE variation.
At the architect level, integrate threat modeling into the ML SDLC (Software Development Life Cycle). Develop organization-wide risk taxonomies and security requirements for AI vendors. Master advanced techniques like probabilistic risk scoring for model extraction and designing resilient architectures (e.g., canary queries, differential privacy for RAG). Mentor teams on trade-offs between utility and security.

Practice Projects

Beginner
Project

Threat Model a Simple Chatbot with a Vector Store

Scenario

You are given a customer service chatbot that uses a RAG pipeline to access a vector database of product manuals. Your task is to identify all potential attack vectors.

How to Execute
1. Diagram the system: User -> LLM Orchestrator -> Vector DB & Product Manuals. 2. Apply STRIDE per element: Could an attacker spoof a user to get the LLM to reveal sensitive manual data? (Data Exfiltration). 3. Brainstorm 5 specific prompt injection examples to trigger this. 4. Document mitigations (e.g., input filtering, permissioned data chunks).
Intermediate
Case Study/Exercise

Analyze the 'DAN' (Do Anything Now) Jailbreak and its Organizational Impact

Scenario

A well-known jailbreak technique like 'DAN' is being used on your company's public-facing LLM to bypass safety filters and generate harmful content. Model your response.

How to Execute
1. Deconstruct the attack: It's a multi-prompt injection forcing persona change. 2. Map the business impact: Reputation damage, content policy violations, potential for harassment. 3. Design layered defenses: (a) Input sanitization for known jailbreak patterns, (b) Output classifiers for harmful content, (c) System prompt hardening with reinforcement. 4. Draft an incident response playbook for this specific vector.
Advanced
Project

Design a Secure Multi-Tenant RAG Architecture with Model Extraction Countermeasures

Scenario

You are the security architect for a SaaS platform where different clients upload proprietary documents to create their own RAG agents. You must prevent cross-tenant data leaks via RAG and protect the base model from extraction.

How to Execute
1. Architect tenant isolation: Separate vector namespaces, encryption keys, and access control at the query level. 2. Implement model extraction defenses: Rate-limit API queries, add noise to model outputs, deploy watermarking. 3. Design monitoring for anomalous query patterns (e.g., systematic probing). 4. Conduct a red team exercise simulating a malicious tenant trying to exfiltrate another tenant's data through crafted RAG queries.

Tools & Frameworks

Mental Models & Methodologies

OWASP Top 10 for LLMs (2023)Microsoft AI STRIDEAttack TreesMITRE ATLAS

OWASP Top 10 provides the critical risk checklist. AI-adapted STRIDE gives a structured threat classification. Attack Trees visualize threat paths. MITRE ATLAS offers a knowledge base of adversarial TTPs against AI systems.

Software & Platforms

Garak (LLM vulnerability scanner)Vigil (prompt injection detection)RebuffLangKit (monitoring)

Garak automates testing for jailbreaks and injections. Vigil and Rebuff are frameworks for detecting and mitigating malicious prompts. LangKit provides monitoring for RAG pipelines to detect anomalous data access.

Practical Artifacts

Threat Modeling Diagrams (DFD/Flowcharts)Security Requirements LogsRisk Register

These are the tangible outputs of the threat modeling process: visual diagrams of data flows, specific security requirements (e.g., 'All RAG outputs must be permissioned against the user's data scope'), and a prioritized list of identified risks.

Interview Questions

Answer Strategy

Use a structured framework. Start by scoping the system (data, models, users, interactions). The first and most critical vector is indirect prompt injection via the ingested documents, because it can bypass all input sanitization at query time. Answer: 'I'd begin by mapping the system with a Data Flow Diagram. The first attack vector I'd prioritize is data poisoning via indirect prompt injection during document ingestion. An attacker could embed malicious instructions in a document that, when retrieved and fed to the LLM, hijacks the session to exfiltrate other indexed documents. This is critical because it turns the RAG system itself into an attack vector, compromising the entire corpus.'

Answer Strategy

This is a behavioral question testing proactive threat hunting and technical depth. The STAR (Situation, Task, Action, Result) method is ideal. Answer: 'Situation: Our team was deploying an LLM for code completion. Task: I was responsible for the security review. Action: I discovered that the model's frequent generation of common library imports could be exploited for model extraction; an attacker could query the model with specific code snippets to systematically map its training data distribution, leaking proprietary code patterns. I implemented query pattern monitoring and output differential privacy. Result: We proactively closed this extraction vector before launch, adding a key security requirement to our ML ops checklist.'

Careers That Require Threat modeling for AI-specific attack vectors (prompt injection, data exfiltration via RAG, model extraction)

1 career found