Skip to main content

Interview Prep

AI eDiscovery Specialist Interview Questions

50 expert questions covering beginner fundamentals to advanced AI workflow scenarios. Each answer includes a hint for structured responses.

Beginner: 5Intermediate: 10Advanced: 10Scenario-Based: 10AI Workflow & Tools: 10Behavioral: 5

Beginner

5 questions
What a great answer covers:

A strong answer covers the EDRM stages (identification, preservation, collection, processing, review, analysis, production) and explains where most cost and effort concentrates (review).

What a great answer covers:

The answer should distinguish relevance (material to the case issues) from privilege (protected from disclosure, e.g., attorney-client privilege or work product doctrine).

What a great answer covers:

A good answer covers the obligation to preserve potentially relevant ESI when litigation is reasonably anticipated, and the consequences of spoliation if a hold fails.

What a great answer covers:

The answer should mention emails, instant messages (Slack, Teams), documents, social media, cloud storage, mobile data, databases, and metadata.

What a great answer covers:

A strong answer explains hash-based deduplication (MD5/SHA-1), custodian-level vs. global deduplication, and how it reduces review volume and cost.

Intermediate

10 questions
What a great answer covers:

The answer should cover TAR 1.0 (train-then-predict with a seed set and cutoff) vs. TAR 2.0 (continuous active learning with no stopping point until recall targets are met).

What a great answer covers:

A strong answer discusses stratified random sampling, richness-based sampling, risks of cherry-picking obvious documents, and the impact of seed set quality on model performance.

What a great answer covers:

The answer should cover shingling/Simhash for near-duplicates, email threading algorithms that collapse conversation chains, and how both reduce redundant review.

What a great answer covers:

A strong answer covers elusion testing (testing the unreviewed set for responsive documents), recall estimation, confidence intervals, and the acceptance threshold.

What a great answer covers:

The answer should address GDPR restrictions on personal data transfer, standard contractual clauses, data minimization, redaction/anonymization strategies, and the Hague Convention.

What a great answer covers:

A good answer references Federal Rule 26(b)(1), cost-benefit balancing, and how AI reduces review costs to shift proportionality calculations.

What a great answer covers:

The answer should cover Rule 26(b)(5) requirements (description of withheld documents without revealing privileged content), and how LLMs can draft log entries from document metadata and content.

What a great answer covers:

A strong answer walks through ingestion, metadata extraction, text extraction, deduplication, date filtering, domain filtering, threading, and loading into a review platform.

What a great answer covers:

The answer should discuss precision/recall trade-offs, the Da Silva Moore case establishing TAR acceptance, and hybrid approaches combining both methods.

What a great answer covers:

A strong answer covers LDA, BERTopic, or similar approaches, how clustering reveals themes, and how this supports issue coding and deposition preparation.

Advanced

10 questions
What a great answer covers:

The answer should cover iterative training loops, active learning sampling strategies (uncertainty sampling, margin sampling), reviewer batch design, stop criteria, elusion testing, and defensible documentation.

What a great answer covers:

A strong answer covers domain-specific fine-tuning on legal corpora, handling class imbalance (privileged docs are rare), distinguishing attorney-client privilege from work product, and the higher stakes of false negatives.

What a great answer covers:

The answer should cover vector embeddings with OpenAI or HuggingFace, chunking strategies for legal documents, metadata filtering, retrieval ranking, and LLM-based classification with citations back to source documents.

What a great answer covers:

A strong answer covers documenting the full TAR protocol, presenting precision/recall metrics, elusion test results, seed set methodology, QC sampling results, and citing case law supporting TAR defensibility.

What a great answer covers:

The answer should cover stratified performance evaluation by language/custodian, balanced sampling, multilingual models, bias auditing, and recalibration strategies.

What a great answer covers:

A strong answer addresses S3 storage tiers, spot instances for batch processing, Lambda for lightweight extraction, Textract pricing for OCR, model inference optimization, and lifecycle policies for archival.

What a great answer covers:

The answer should discuss threshold calibration, human review of borderline documents, buffer zone strategies, and how to document the decision rationale for defensibility.

What a great answer covers:

A strong answer addresses data privacy risks, API data retention policies, the need for enterprise agreements, zero-retention endpoints, and bar association opinions on using AI with client confidences.

What a great answer covers:

The answer should cover named entity recognition with spaCy or AWS Comprehend, regex patterns for SSNs/financial data, confidence thresholds, human QC sampling, and redaction permanence verification.

What a great answer covers:

A strong answer provides a specific scenario (e.g., coded language in financial fraud), explains the semantic search or embedding-based approach, and quantifies the improvement with metrics.

Scenario-Based

10 questions
What a great answer covers:

A strong answer covers prioritization by custodian relevance, parallel processing, TAR for early prioritization, defensible sampling, and phased production strategy.

What a great answer covers:

The answer should cover metadata analysis (creation vs. modification dates, author field anomalies), batch comparison tools, forensic preservation of findings, and reporting to counsel.

What a great answer covers:

A strong answer discusses analyzing error patterns, expanding seed set diversity, adjusting sampling strategy, reviewing false negatives for pattern recognition, and setting a recall target aligned with proportionality.

What a great answer covers:

The answer should cover multilingual models (XLM-R, mBERT), language-specific preprocessing, per-language performance benchmarking, translation for QC, and handling code-switching in documents.

What a great answer covers:

A strong answer covers SOC 2 compliance, encryption at rest and in transit, data residency requirements, access controls, and the option for on-premise or hybrid deployment architectures.

What a great answer covers:

The answer should address short-message challenges for NLP, conversation threading in channels vs. DMs, emoji/reaction analysis, temporal clustering, and the higher volume-to-relevance ratio.

What a great answer covers:

A strong answer covers production audit logs, privilege QA workflow improvements, second-pass privilege screening with AI, and updating the TAR model to flag privilege-risk documents.

What a great answer covers:

The answer should discuss domain shift analysis, feature importance examination, fine-tuning on new domain data, active learning to quickly adapt, and evaluating whether a fresh model outperforms transfer.

What a great answer covers:

A strong answer clarifies the boundary between eDiscovery and litigation analytics, discusses what's feasible (sentiment analysis, strength indicators) vs. what requires legal expertise, and manages expectations.

What a great answer covers:

The answer should cover Concordance/Relativity load file formats (DAT/OPT/LFP), field mapping, metadata validation, image numbering, OCR text file alignment, and hash verification of the production set.

AI Workflow & Tools

10 questions
What a great answer covers:

A strong answer covers data preprocessing, tokenization, fine-tuning a BERT-based model on coded documents, evaluation with legal-domain metrics, model versioning, and integration with the review platform.

What a great answer covers:

The answer should cover chain-of-thought prompting, sequential chains or LCEL pipelines, output parsing with Pydantic models, retry logic for robustness, and cost/latency considerations.

What a great answer covers:

A strong answer covers chunking strategies for long documents, embedding model selection, vector store options (Pinecone, Weaviate, FAISS), metadata filtering for date/custodian constraints, and relevance ranking.

What a great answer covers:

The answer should cover Textract's asynchronous API for batch OCR, Comprehend custom classifiers, handling multi-page documents, table extraction, and cost optimization with S3-based workflows.

What a great answer covers:

A strong answer covers Git for code and configs, DVC or MLflow for model versioning, Docker containers for environment reproducibility, and matter-specific configuration files.

What a great answer covers:

The answer should cover structured output prompts (JSON mode), confidence scoring, human QC sampling, hallucination mitigation with RAG from the actual document, and formatting compliance with Rule 26(b)(5).

What a great answer covers:

A strong answer compares Relativity's built-in Active Learning (priority queue, project settings) with custom scikit-learn implementations, discussing trade-offs in flexibility, defensibility documentation, and ease of use.

What a great answer covers:

The answer should cover pre-trained NER models, fine-tuning on legal entity types (judge names, case numbers, financial account numbers), handling false positives, and integrating NER results into a redaction workflow.

What a great answer covers:

A strong answer covers Apache Airflow or similar orchestration, format-specific extractors (PST, MBOX, SharePoint), metadata normalization, text extraction, deduplication, and loading into Elasticsearch or a review platform.

What a great answer covers:

The answer should cover tracking prediction distributions over time, comparing against baseline distributions, sampling for human validation, automated retraining triggers, and dashboard visualization with Grafana or similar tools.

Behavioral

5 questions
What a great answer covers:

A strong answer demonstrates attention to detail, proactive problem identification, clear communication to stakeholders, and a systematic approach to remediation.

What a great answer covers:

The answer should demonstrate the ability to translate technical concepts into practical legal implications, using analogies and visualizations rather than jargon.

What a great answer covers:

A strong answer shows pragmatic prioritization, creative use of sampling to validate quickly, transparent communication about trade-offs, and adherence to defensibility standards.

What a great answer covers:

The answer should mention specific sources (Sedona Conference, ILTA, Legaltech News, Relativity Fest), professional communities, and a habit of continuous learning.

What a great answer covers:

A strong answer demonstrates stakeholder management, clear communication of technical constraints, prioritization frameworks, and the ability to find solutions that satisfy multiple requirements.