AI Sandbox Engineer
An AI Sandbox Engineer designs, builds, and maintains isolated, secure environments where AI models, agents, and workflows can be …
Skill Guide
A systematic methodology for identifying, testing, and mitigating vulnerabilities in Large Language Model (LLM) systems where malicious inputs can manipulate the model's behavior, bypass safety controls, or extract sensitive information.
Scenario
You are given a simple LLM chatbot API endpoint. Your task is to create a Python wrapper that intercepts user inputs and flags potential injection attempts before they reach the LLM.
Scenario
Your company has deployed a Retrieval-Augmented Generation (RAG) chatbot for internal knowledge bases. You must perform a penetration test to uncover vulnerabilities where a user could trick the model into revealing confidential document excerpts or bypassing access controls.
Scenario
You are the lead security architect for a multi-tenant LLM platform. You need to establish a continuous, automated adversarial testing regime that scales with new model deployments and evolving attack techniques, without disrupting development velocity.
Use these for automated, large-scale adversarial testing. Garak and Promptfoo are purpose-built for LLM probing, while Counterfit and ART provide broader ML adversarial testing suites. Integrate them into your testing pipeline for regression testing.
OWASP provides the definitive vulnerability taxonomy for LLMs. MITRE ATLAS offers a knowledge base of adversary tactics. STRIDE helps systematically identify threats (Spoofing, Tampering, etc.) during the design phase of AI systems.
Blacklisting is a fast first line of defense. Semantic similarity detects paraphrased attacks. Fine-tuned classifiers offer the highest accuracy for novel attacks but require labeled data and model training resources.
Answer Strategy
Use the 'Defense in Depth' framework. Structure your answer around Input, Processing, and Output layers. Sample Answer: 'I'd implement a three-layer defense. First, input-level: semantic similarity checks against a corpus of known attacks and a fine-tuned classifier for zero-day attempts. Second, at the processing level, I'd use compartmentalized prompts and strict system instruction hardening. Finally, at the output level, I'd apply PII filters and a validator to ensure responses don't contain leaked data or bypassed instructions.'
Answer Strategy
Tests for hands-on experience and risk communication skills. Sample Answer: 'During a red team, I discovered an indirect injection where a user could upload a resume with hidden instructions. When our HR bot summarized it, the instructions executed, attempting to scrape other candidate names from the database. I communicated the risk by demonstrating the PoC, quantifying the data leakage potential, and proposing a fix to sanitize uploaded documents before summarization, which we implemented within the sprint.'
1 career found
Try a different search term.