Skill Guide

User research for AI products - designing studies that account for AI novelty effects, trust calibration, and edge-case discovery

The systematic design and execution of user research methodologies specifically engineered to isolate and measure the behavioral, cognitive, and emotional impacts unique to AI-powered products, moving beyond traditional usability to capture phenomena like automation bias, model opacity, and emergent failure modes.

This skill directly mitigates product risk by identifying AI-specific usability cliffs, trust breakdowns, and catastrophic edge cases before launch, preventing costly recalls, reputational damage, and user attrition. It transforms user feedback into actionable model and system improvements, creating a defensible moat of user-centric AI that competitors cannot easily replicate through technology alone.

1 Careers

1 Categories

9.1 Avg Demand

15% Avg AI Risk

How to Learn User research for AI products - designing studies that account for AI novelty effects, trust calibration, and edge-case discovery

1. **Foundational Theories**: Study the core psychological concepts of automation bias, algorithmic aversion/appreciation, and the 'black box' problem. 2. **Core Methodology**: Master traditional qualitative (contextual inquiry, think-aloud) and quantitative (A/B testing, surveys) methods, then learn how to adapt them (e.g., adding pre/post-task trust measures). 3. **Ethical Baselines**: Understand IRB processes, informed consent for AI studies, and data privacy regulations (GDPR, CCPA) as they apply to user interaction data.

1. **Scenario Design**: Move from testing features to testing outcomes. Design studies around specific AI goals (e.g., 'Does the user understand *why* the model recommended X?'). 2. **Novelty Mitigation**: Implement longitudinal study designs (diary studies, weekly check-ins) to track behavior changes as the 'AI shine' wears off. 3. **Edge-Case Protocols**: Develop a taxonomy for AI-specific edge cases (e.g., confident but wrong, biased outputs, low-confidence fallback) and create targeted research plans to provoke them safely.

1. **Systems-Level Integration**: Embed research findings directly into the ML feedback loop, creating a formalized 'Human-in-the-Loop' research pipeline for model retraining. 2. **Strategic Framing**: Align research initiatives with business KPIs (e.g., 'How does improving model explainability impact user retention for our enterprise tier?'). 3. **Mentorship & Evangelism**: Train cross-functional teams (PM, Eng, Data Science) on AI research principles to build a shared mental model for user-centric AI development.

Practice Projects

Beginner

Case Study/Exercise

Deconstructing a Novelty-Biased Interaction

Scenario

Users of a new AI-powered photo editor are rating generated images very highly in initial surveys, but usage metrics show declining engagement after two weeks.

How to Execute

1. **Re-Contact**: Reach out to a subset of initial high-raters who have become inactive. 2. **Structured Interview**: Use a semi-structured interview guide focusing on: 'Describe the first time you used the AI tool vs. the last time.' 'What surprised you about how it worked?' 'When did you feel most/least in control?' 3. **Analysis**: Code transcripts for themes of novelty, frustration, unpredictability, and skill mismatch. 4. **Report**: Present findings as a 'Novelty Decay Curve' with specific interaction points where expectations diverged from reality.

Intermediate

Case Study/Exercise

Designing a Trust-Calibration Experiment for a High-Stakes AI

Scenario

Your company is building an AI diagnostic assistant for radiologists. Over-trust (automation bias) is a critical safety risk, while under-trust renders the tool useless.

How to Execute

1. **Define Trust Metrics**: Operationalize 'trust' as measurable behaviors: frequency of overriding AI suggestions, time to decision, and accuracy of final diagnosis. 2. **Build Test Sets**: Curate three sets of images: 'Obvious' (AI correct), 'Ambiguous' (AI correct but with low confidence), and 'Adversarial' (AI confidently wrong). 3. **Conduct Within-Subjects Study**: Have participants (radiologists) diagnose cases with and without AI assistance, using eye-tracking to monitor attention. 4. **Analyze**: Compare diagnostic accuracy, decision time, and trust-seeking behavior (e.g., looking at AI confidence scores) across the three sets. Deliverable is a 'Trust Profile' for the user persona.

Advanced

Case Study/Exercise

Establishing an AI Edge-Case Discovery & Triage Protocol

Scenario

You lead research for a conversational AI platform. Users are encountering unpredictable and sometimes harmful failures that standard QA missed. You need a scalable system to surface these issues before they cause harm.

How to Execute

1. **Create a 'Red Team' Lab**: Recruit a diverse panel of users with explicit instructions to find the AI's breaking points (e.g., 'Try to make it give dangerous advice,' 'Try to confuse it with contradictory context'). 2. **Implement 'Shadow Mode'**: Run the new model version in parallel, logging disagreements between it and the stable version. Feed these 'model dissonance' logs to researchers for analysis. 3. **Develop a Severity Matrix**: Categorize discovered edge cases by likelihood of occurrence, potential for harm, and difficulty of detection. 4. **Close the Loop**: Mandate that no new model version ships without a sign-off from the research team confirming that all 'Critical' and 'High' severity edge cases from the matrix have been mitigated in the model or handled gracefully in the UI.

Tools & Frameworks

Mental Models & Methodologies

Expectation-Reality Gap AnalysisTrust Calibration Framework (Muir 1994)Fogg Behavior Model (B=MAP) for AI AdoptionSEIPS (Systems Engineering Initiative for Patient Safety) Model adapted for AI-Human teams

Use Expectation-Reality Gap Analysis to structure interviews pre/post AI interaction. Apply the Trust Calibration Framework to design measurable trust studies. Adapt the Fogg Model to assess if user ability and motivation align with AI prompt requirements. Use SEIPS to map the entire system (user, AI, tools, environment) to identify failure points beyond the algorithm.

Software & Data Platforms

Qualtrics/SurveyMonkey (with logic branching)Dovetail/Purple Hex (for qualitative coding)Looker/Tableau (for behavioral log analysis)Amazon Mechanical Turk/Prolific (for scale studies)

Use Qualtrics to build dynamic surveys that branch based on reported trust levels. Use Dovetail to tag and analyze qualitative interview data for AI-specific themes. Use Looker to build dashboards correlating AI confidence scores with user override rates from interaction logs. Use Prolific for recruitment of specific user personas (e.g., 'data-literate but not technical').

Prototyping & Simulation Tools

Wizard-of-Oz PrototypingCustom Sandbox EnvironmentsConfidence Threshold Sliders

Use Wizard-of-Oz to simulate AI capabilities before a model exists, allowing you to test user reactions to different confidence levels and failure modes. Build a sandbox environment where researchers can safely push edge-case scenarios without affecting live systems. Implement a 'confidence threshold slider' in prototypes to study how users adjust their trust when given explicit control over AI certainty.

Interview Questions

Answer Strategy

The interviewer is testing your ability to identify AI-specific risks (novelty, trust, edge cases) and translate them into a concrete methodology. Structure your answer around the three core pillars. Sample Answer: 'First, I'd isolate novelty effects by running a longitudinal diary study to see if usage patterns change over a month. Second, to test trust calibration, I'd design an experiment with a mix of perfect, mediocre, and terrible draft suggestions, measuring user override rates and satisfaction. Finally, for edge-case discovery, I'd conduct adversarial sessions where users intentionally send ambiguous or sensitive emails to see how the AI behaves and how they react to its failures.'

Answer Strategy

This tests your stakeholder management, influence, and ability to frame data persuasively. The core competency is advocating for the user with evidence, not opinion. Sample Answer: 'In my previous role, the data science team was excited about a 5% accuracy boost in our recommendation model. However, my user study showed the new model's suggestions were less explainable, leading to a significant drop in user trust and purchase intent. I presented the findings by focusing on business outcomes: I showed video clips of users hesitating and abandoning carts, paired with the survey data linking confusion to the new model's opacity. I reframed the conversation from 'accuracy' to 'user-perceived accuracy and trust,' which are the actual drivers of revenue. We ultimately delayed the launch until the team added an explanation layer.'