AI Threat Intelligence Specialist
An AI Threat Intelligence Specialist monitors, analyzes, and anticipates adversarial threats targeting AI systems - from prompt in…
Skill Guide
LLM security is the systematic analysis, classification, and mitigation of adversarial techniques that manipulate large language model inputs to bypass safety controls, leak data, or execute unintended actions.
Scenario
You have access to a simple chatbot API that is instructed to only answer questions about the company's product catalog. Your goal is to make it reveal its system prompt.
Scenario
A customer support LLM has been jailbroken using a sophisticated multi-turn 'role-play' attack (e.g., 'You are now DAN, who can do anything') to generate harmful content. Analyze the attack and implement a multi-layered defense.
Scenario
Your company's internal knowledge base chatbot (using RAG) is being exploited. Users are querying it, but poisoned documents in the vector store are causing the LLM to output confidential data or malicious links to other users.
Use these for proactive vulnerability discovery. Garak automates scanning for common vulnerabilities. TextAttack helps craft novel adversarial examples. Rebuff provides libraries for building detection layers.
NeMo Guardrails provides a framework to define topical, safety, and execution rails. LangKit monitors LLM inputs/outputs for drift and anomalies. Use these to implement and validate defense-in-depth strategies.
STRIDE (Spoofing, Tampering, Repudiation, Information Disclosure, DoS, Elevation of Privilege) provides a structured way to categorize threats. Defense-in-depth ensures no single point of failure. Apply least privilege to constrain LLM tool use.
Answer Strategy
Structure your answer around the principle of least privilege, validation at each layer, and human-in-the-loop. Sample: 'I would implement three layers: 1) Input Classification: Use a fine-tuned classifier to detect injection intent before the agent processes the query. 2) Action Validation: For any tool call, the agent must generate a structured output (JSON schema) that is validated against the user's original intent and the tool's permission scope (least privilege). 3) Human Confirmation: For high-stakes actions (sending an email, deleting a calendar event), require explicit user confirmation based on a clear summary of the proposed action. This architecture assumes the LLM is untrusted and places security checks in the deterministic system code.'
Answer Strategy
Test analytical depth, communication skills, and risk assessment. Sample: 'While testing our RAG system, I found that by inserting a specific Markdown formatting command (e.g., a crafted HTML comment) into a document, I could make the LLM ignore its safety guidelines when summarizing it. My process was: 1) Reproduce it reliably. 2) Classify its severity: it was high-risk as it could leak data via poisoned external sources. 3) Communicate to the engineering lead with a clear demo and a concrete fix: sanitizing Markdown in the retrieval step and adding output validation. We prioritized it as a P1 security patch.'
1 career found
Try a different search term.