AI Review Content Analyst
An AI Review Content Analyst evaluates, audits, and improves AI-generated text, images, and multimedia content to ensure factual a…
Skill Guide
The systematic process of defining measurable criteria and structured scoring systems to assess the accuracy, relevance, safety, and usefulness of AI-generated text, images, or code.
Scenario
You are given 50 AI-generated news summaries. Your task is to identify instances where the AI invented facts not present in the source text.
Scenario
Design a quality assurance framework for a customer support bot that handles returns. The bot must be helpful, brand-safe, and policy-compliant.
Scenario
As a Lead AI Trainer, you must reduce the 'sycophancy' (overly agreeable tendencies) of a foundational model while maintaining its helpfulness score.
Used for the human-in-the-loop evaluation process. Essential for managing large datasets of prompts and responses, tracking annotator performance, and managing inter-annotator agreement (IAA).
Likert scales are standard for granular human scoring. Pairwise Comparison forces raters to choose the 'least bad' option for preference tuning. G-Eval and CAI are frameworks for using LLMs to automate evaluation based on custom principles.
Kappa metrics measure the reliability of human raters. Pass@k is used to evaluate code generation reliability. The Brier Score assesses the accuracy of probabilistic predictions in fact-checking tasks.
Answer Strategy
Use a 'Severity Matrix' framework. Distinguish between 'Hard Refusals' (illegal acts, high-risk harm) and 'Soft Refusals' (subjective topics). Update the annotation guidelines to treat 'Soft Refusal' scenarios as 'Response with Nuance' rather than 'Refusal,' ensuring the rater captures the need for balanced information over blanket denial.
Answer Strategy
Test for 'Post-Mortem Analysis' and 'Iterative Design.' The candidate should describe a specific blind spot (e.g., ignoring 'formatting' in code tasks), explain how they detected the discrepancy between rubric scores and user feedback, and detail the specific rubric revision implemented to close the gap.
1 career found
Try a different search term.