AI Pharmacovigilance Analyst
An AI Pharmacovigilance Analyst uses machine learning, natural language processing, and automation platforms to detect, assess, an…
Skill Guide
The systematic application of quantitative metrics, adversarial testing, and formal verification methods to ensure LLM outputs are factually accurate, free from harmful confabulation, and safe for deployment in high-stakes environments.
Scenario
You need to evaluate a medical LLM's tendency to fabricate drug interactions or dosage information.
Scenario
A model summarizing SEC filings must not invent financial figures or misstate legal risks.
Scenario
An AV's LLM generates natural language explanations of its driving decisions for post-incident analysis. A hallucinated or misleading log could misdirect a safety investigation.
Used to programmatically assess LLM output quality across dimensions like faithfulness, answer relevance, and hallucination. RAGAS is particularly strong for RAG pipeline evaluation.
Used to enforce structural and semantic constraints on LLM outputs in real-time, preventing invalid or unsafe responses from reaching the end-user.
Fine-tuned natural language inference models used to check for textual entailment (factuality) between a source document and a generated claim.
Platforms for logging, visualizing, and monitoring LLM evaluation metrics (e.g., hallucination rate, faithfulness score) over time in production to detect degradation and drift.
Answer Strategy
The candidate must demonstrate a risk-based, multi-layered approach. They should prioritize 'do no harm' failure modes (suggesting a fatal diagnosis as benign) over minor inaccuracies. A strong answer outlines: 1) Input validation (structured data extraction), 2) Output validation (checking against a medical ontology like SNOMED CT), 3) Factual grounding (NLI check against the patient note), 4) A strict fallback to human review for any low-confidence or high-severity output. They should mention metrics like 'false negative rate for critical conditions'.
Answer Strategy
This is a behavioral question testing for proactive debugging and systemic thinking. The candidate should describe a specific, non-obvious failure (e.g., temporal hallucinations, incorrect but plausible-looking units, or citing the wrong section of a contract). They should explain the detection method (likely a combination of automated spot-checks and user feedback) and the mitigation (a permanent test case added to the CI/CD evaluation suite, a post-processing rule, or a fine-tuning data augmentation).
1 career found
Try a different search term.