Prompt Engineer
Prompt Engineers design, test, and optimize natural-language instructions that control large language models (LLMs) and multimodal…
Skill Guide
The discipline of systematically identifying, testing, and mitigating security vulnerabilities, ethical risks, and alignment failures in AI systems through adversarial probing, policy enforcement, and bias detection.
Scenario
Given a base large language model with standard safety guidelines (e.g., refusing harmful instructions), attempt to elicit a forbidden response using a known jailbreak technique (e.g., 'Do Anything Now' - DAN, role-playing, hypothetical framing).
Scenario
A deployed NLP model that ranks resumes for a software engineering role shows a statistically significant disparity in interview callback rates between candidates from different university tiers and genders.
Scenario
An AI-powered customer service agent has been given increased autonomy to process refunds, modify accounts, and access internal knowledge bases. You must identify critical failure modes before a high-stakes product launch.
Use MITRE ATLAS and OWASP LLM Top 10 to structure threat models and test cases. Use frameworks like HarmBench and TextAttack to programmatically generate adversarial prompts and measure attack success rates against models.
Deploy Guardrails AI or NeMo Guardrails to programmatically define and enforce input/output validation rules and topic boundaries. Use cloud-native solutions like Azure AI Content Safety for scalable content moderation APIs. Use Patronus AI for automated evaluation and monitoring of model safety and correctness.
Use Fairlearn or AI Fairness 360 to compute fairness metrics and apply mitigation algorithms to data or models. Use Aequitas for auditing bias in decision pipelines and the What-If Tool for visually exploring model behavior across subgroups.
Answer Strategy
Structure your answer around the phased approach: 1) Scoping & Rules of Engagement, 2) Threat Modeling (use ATLAS/OWASP), 3) Team Composition & Attack Execution (categorize attacks: prompt injection, data leakage, misinformation), 4) Vulnerability Triage & Reporting, 5) Post-mortem with engineering on fixes. Emphasize collaboration, not just exploitation. Sample answer: 'I'd start by aligning with the product and security teams on the scope-defining critical assets like confidential data and high-risk actions. The red team would include adversarial ML specialists and domain experts. We'd threat-model using the OWASP LLM Top 10, then execute targeted tests: indirect injection via uploaded documents to leak internal data, and adversarial prompts to override safety filters. All findings would be triaged by impact and a joint remediation plan would be created, focusing on input validation, user permission scoping, and output monitoring.'
Answer Strategy
Tests systematic problem-solving and knowledge of the full bias mitigation lifecycle. Use the 'Diagnose-Mitigate-Validate' framework. Sample answer: 'First, I'd diagnose by performing a stratified analysis of model outputs using fairness metrics like demographic parity across the relevant demographics, isolating the bias. Next, I'd trace the cause-examining training data composition using tools like Aequitas, then model behavior via techniques like probing. For mitigation, I'd choose the intervention stage: for data bias, apply re-sampling or counterfactual augmentation; for model bias, use in-processing techniques like adversarial debiasing. The fix would be validated by re-running the fairness metrics and ensuring acceptable performance trade-offs. Finally, I'd establish a monitoring dashboard to detect regression.'
1 career found
Try a different search term.