Skill Guide

LLM-specific threat modeling including prompt injection and agent tool abuse

The systematic process of identifying, analyzing, and mitigating security vulnerabilities unique to Large Language Model applications, with a primary focus on adversarial manipulation of model inputs (prompt injection) and unauthorized abuse of the model's integrated external tools or APIs (agent tool abuse).

This skill is critical for preventing catastrophic system compromise, data exfiltration, and reputational damage in AI-powered products. It directly protects revenue and customer trust by ensuring the security and integrity of the most powerful and unpredictable components in modern software stacks.

1 Careers

1 Categories

9.2 Avg Demand

15% Avg AI Risk

How to Learn LLM-specific threat modeling including prompt injection and agent tool abuse

1. Foundational LLM Concepts: Understand tokenization, transformer architecture basics, and the difference between base models and instruction-tuned models. 2. Threat Vocabulary: Memorize core definitions: direct/indirect prompt injection, jailbreaking, system prompt leakage, data poisoning, and tool abuse. 3. OWASP LLM Top 10: Study the OWASP Top 10 for LLM Applications as a baseline framework, focusing on LLM01 (Prompt Injection) and LLM07 (Insecure Plugin/Tool Design).

1. Hands-on Attack Simulation: Use platforms like HackTheBox, TryHackMe, or dedicated labs to practice crafting prompts that bypass system instructions and safety filters. 2. Defense-in-Depth Implementation: Move beyond single-line system prompts. Practice designing layered defenses: input/output filtering, output parsing validation, and tool-call sandboxing. 3. Common Pitfall: Avoiding 'security by obscurity'-relying solely on complex system prompts without underlying code-level guardrails. A common mistake is neglecting the context window limits and injection vectors in retrieved data (RAG).

1. Systemic Threat Modeling: Architect threat models for complex agentic systems (e.g., multi-step reasoning, tool chains, memory systems). Use frameworks like STRIDE or MITRE ATLAS adapted for LLMs. 2. Strategic Alignment: Align LLM security with broader organizational DevSecOps and Zero Trust initiatives. Develop security requirements for model fine-tuning data and RLHF pipelines. 3. Mentorship: Design and lead internal training programs for engineering and product teams, translating technical risks into business impact terms for executive stakeholders.

Practice Projects

Beginner

Project

Build a Simple Prompt Injection Detector

Scenario

You are given a simple LLM-based customer support chatbot that uses a system prompt. Your task is to create a Python-based filter that detects and blocks common prompt injection attempts before they reach the model.

How to Execute

1. Define a list of known injection signatures (e.g., 'Ignore previous instructions', 'You are now DAN'). 2. Implement a pre-processing function that scans user input for these patterns using regex and keyword matching. 3. Integrate the filter into a simple FastAPI/Flask wrapper around an API call to the LLM (e.g., OpenAI). 4. Test the filter against a set of adversarial prompts and measure its false positive/negative rate.

Intermediate

Case Study/Exercise

Threat Model an AI-Powered File Manager Agent

Scenario

You are the security lead for an AI agent that can read, write, and summarize files on a user's computer based on natural language commands. The agent uses a code interpreter tool.

How to Execute

1. Diagram the system: Map the flow from user input -> LLM -> tool API (code interpreter) -> file system -> output. 2. Apply STRIDE: Identify Spoofing (agent impersonation), Tampering (malicious code generation), Repudiation (unlogged actions), Information Disclosure (reading sensitive files), Denial of Service (disk filling), Elevation of Privilege (agent gaining root access). 3. For each threat, propose mitigations: e.g., for Tampering-sandbox the interpreter with Docker; for Elevation of Privilege-implement strict tool-use allowlists. 4. Document the threat model in a formal report.

Advanced

Project

Design a Secure Tool-Invocation Gateway

Scenario

Architect a middleware layer that sits between an LLM-based agent and a suite of external enterprise tools (e.g., CRM, ERP, HR systems). The goal is to enforce security policies, audit all actions, and prevent agent tool abuse.

How to Execute

1. Define an API specification for tool invocation that includes mandatory fields: 'tool_name', 'parameters', 'justification_context' (the LLM's reasoning), and 'requested_by_agent'. 2. Implement a policy engine (e.g., using Open Policy Agent) that evaluates each request against rules: rate limits, user role-based access control, and parameter sanitization (e.g., SQL injection filters for database queries). 3. Build an immutable audit log that records the full context of every tool call, including the LLM's preceding conversation. 4. Develop a 'kill switch' and anomaly detection module that can pause agent activity based on suspicious patterns (e.g., rapid sequence of sensitive data reads).

Tools & Frameworks

Attack Simulation & Testing

Garak (by NAI Lab)Microsoft's PyRITNeMo Guardrails (by NVIDIA)

Garak and PyRIT are fuzzing frameworks for systematically probing LLMs for vulnerabilities. NeMo Guardrails provides a toolkit for building programmable constraints and rules for LLM inputs/outputs, useful for defining and testing security policies.

Monitoring & Defense

LangKit (by WhyLabs)RebuffCustom Output Validators with Pydantic/Guardrails.ai

LangKit monitors LLM prompts/completions for drift, toxicity, and injection patterns. Rebuff is a dedicated prompt injection detection framework. Pydantic or Guardrails.ai can be used to enforce strict output schema validation from the LLM before it's passed to a tool, preventing malformed or malicious code execution.

Threat Modeling Methodologies

OWASP LLM Top 10MITRE ATLAS (Adversarial Threat Landscape for AI Systems)STRIDE for LLMs

OWASP provides the baseline vulnerability classification. MITRE ATLAS offers a knowledge base of adversary tactics specific to AI. STRIDE, when adapted, helps in systematically categorizing threats to LLM components like prompts, tools, and memory stores.

Interview Questions

Answer Strategy

The candidate should structure the answer using a recognized methodology (STRIDE/OWASP). A strong answer will cover: 1) Identifying assets (documents, prompts, model weights, API keys). 2) Trust boundaries (user input, LLM, vector DB, tool API). 3) Specific threats: Indirect injection via poisoned documents in the vector store, direct prompt injection to leak documents, tool abuse to over-extract data, and data poisoning during the indexing process. 4) Mitigations: Input sanitization for queries, output validation for summaries, strict access control on the vector store, and rate limiting on the summarization tool.

Answer Strategy

This tests business risk translation. Sample Answer: 'Consider a sales automation agent integrated with a CRM. If compromised via prompt injection, it could abuse its 'create_contact' and 'send_email' tools to spam thousands of prospects, damaging brand reputation and triggering spam filters, or it could exfiltrate the entire contact list. Key controls are: 1) Technical: Implement a human-in-the-loop confirmation step for all write operations, use a parameterized query tool instead of raw code generation, and enforce the principle of least privilege for API tokens. 2) Process: Maintain a full audit trail of all agent actions tied to a user session, and have an incident response playbook specifically for agent misuse.'