AI Vulnerability Assessment Specialist
An AI Vulnerability Assessment Specialist systematically identifies, tests, and documents security weaknesses in machine learning …
Skill Guide
LLM application security is the discipline of identifying, mitigating, and preventing adversarial attacks that manipulate a large language model's inputs, outputs, or context to bypass safety controls, extract sensitive data, or force unauthorized actions.
Scenario
You have a customer service chatbot that uses a system prompt. Your goal is to create a preprocessing layer that flags or blocks attempts to ignore or override the system instructions.
Scenario
Your company's internal knowledge base is connected to an LLM via a vector database. An attacker could poison the source documents to manipulate the LLM's answers when employees query it.
Scenario
You are the security architect for an LLM agent that can access a user's calendar and email to draft responses. A sophisticated attacker crafts an email that, when processed by the agent, tricks it into summarizing the user's upcoming meetings and embedding that data in a URL it requests to 'fetch more information'.
These are specialized libraries for scanning prompts/outputs, detecting injections, and enforcing content policies in real-time. Use them as middleware in your application stack.
Use these frameworks for threat modeling, risk assessment, and designing layered security controls. OWASP provides prioritized vulnerabilities; MITRE ATLAS offers a knowledge base of adversarial tactics.
Used for proactive security testing. Garak scans models for exploits; PyRIT helps automate adversarial prompt generation for red teaming.
Answer Strategy
The candidate should demonstrate knowledge of prompt structure and layered defenses. Sample answer: 'I would implement a hierarchical instruction set with clear delimiters (e.g., XML tags) separating the core system instructions from user input. The system prompt would explicitly forbid discussing other topics and include a 'tripwire' instruction that triggers a canned safe response if any external data segment attempts to override core rules. Additionally, I'd layer on an input classifier to detect and block known injection patterns before the prompt reaches the LLM.'
Answer Strategy
This tests practical experience and methodology. Sample answer: 'In a previous project, an LLM was summarizing customer support tickets, which contained PII. I identified that the model's context window could be manipulated to regurgitate raw ticket details. My validation process involved creating targeted test cases that tried to extract the data by asking the model to 'repeat the last ticket verbatim.' To mitigate, I implemented a PII scrubbing layer in the data pipeline before ingestion and added an output monitor using a NER model to redact any residual sensitive entities from the final response.'
1 career found
Try a different search term.