Interview Prep
AI Semantic Search Engineer Interview Questions
50 expert questions covering beginner fundamentals to advanced AI workflow scenarios. Each answer includes a hint for structured responses.
Beginner
5 questionsA great answer contrasts BM25/TF-IDF with dense embeddings, explains synonymy and polysemy, and gives an example like searching 'affordable laptop for college' matching 'budget notebook for students'.
Covers dense numerical representation of text, similarity via cosine distance, and how query and document embeddings are compared in the same vector space.
Should define specialized storage for high-dimensional vectors with ANN indexing, and name Pinecone, Weaviate, Qdrant, Milvus, or pgvector.
Should explain document segmentation for embedding, discuss chunk size tradeoffs (too large loses specificity, too small loses context), and mention strategies like recursive or semantic chunking.
Covers direction vs. magnitude, normalization benefits, and how cosine similarity focuses on semantic orientation rather than vector length.
Intermediate
10 questionsShould cover hierarchical navigable small world graphs, the M (connections per layer) and efConstruction/efSearch parameters, and how tuning them trades build time, query latency, and recall.
Covers BM25 + vector search, reciprocal rank fusion (RRF) or weighted score combination, and scenarios like rare proper nouns or exact-match queries where sparse methods excel.
Should explain that bi-encoders encode independently (fast, used for initial retrieval) while cross-encoders attend jointly (slow but more accurate, used for re-ranking top-K results).
Covers MRR, NDCG, Recall@K, Precision@K, MAP, and ideally end-to-end metrics like answer accuracy in RAG. Should explain what each metric emphasizes.
Should discuss chunk size experiments, model selection considering latency and quality (e.g., text-embedding-3-small vs. large), HNSW parameters, and index metadata filtering.
Covers query embedding, retrieval, re-ranking, context injection into prompt, LLM generation, and ideally citation grounding and hallucination mitigation.
Should mention query expansion, fallback to keyword search, confidence thresholding, detecting low-retrieval-score queries, and logging for model retraining.
Covers using in-batch negatives vs. mined hard negatives, how hard negatives improve decision boundaries, and practical approaches like BM25 or cross-encoder mining.
Should discuss filtering by date, category, user permissions, and the tradeoffs of pre-filtering (narrows search space, may hurt recall) vs. post-filtering (filters after retrieval, may reduce result count).
Covers multilingual embedding models (e.g., multilingual-e5, BGE-M3), language detection, cross-lingual retrieval, and evaluation across languages.
Advanced
10 questionsShould cover isolating retrieval vs. generation failures, checking retrieval recall on gold sets, examining context window utilization, testing with oracle context, and implementing citation verification.
Should address domain-specific embedding fine-tuning, tiered retrieval (ANN coarse + re-ranker fine), sharded vector indexes, caching popular queries, and legal-specific evaluation metrics.
Should cover HNSW (high recall, memory-heavy), IVF-PQ (memory-efficient with product quantization, good for large corpora), ScaNN (anisotropic quantization, Google's offering), and latency-memory-recall tradeoffs.
Covers monitoring retrieval quality metrics over time, comparing embedding distributions (e.g., via MMD or centroid drift), A/B testing new models, and establishing retraining triggers.
Should discuss click-through logging, implicit relevance signal extraction, using feedback for hard-negative mining, fine-tuning embeddings, and online evaluation with interleaving experiments.
Covers quantization (scalar, product, binary), index rebuilding with tuned HNSW parameters, query result caching, pre-filtering to reduce candidate sets, hardware considerations (GPU ANN), and tiered retrieval.
Should explain combining metadata filtering (structured) with semantic similarity (unstructured), implementing compound queries, and designing a retrieval pipeline that handles both predicate types.
Covers train/test split methodology for retrieval, avoiding data leakage, using held-out query-document pairs, statistical significance testing, and overfitting to training distribution.
Should cover per-token API costs at scale, latency implications of network calls, model quality benchmarks (MTEB), data privacy, customization via fine-tuning, and operational complexity of self-hosting.
Covers namespace/partition strategies in vector databases, tenant-aware metadata filtering, separate vs. shared indexes, security guarantees, and cost-efficient resource sharing.
Scenario-Based
10 questionsShould address query decomposition (attribute extraction for price, features), hybrid retrieval with structured filters + semantic matching, and potentially training a query understanding component.
Covers checking context window utilization, prompt engineering for faithfulness, testing with the retrieved context manually, implementing chain-of-thought or citation enforcement, and setting up answer verification.
Should cover running parallel systems, A/B testing with business-relevant metrics (conversion, task completion), hybrid approach as a bridge, measuring improvement on hard queries, and phased rollout.
Covers self-hosted embedding models (no data leaving infrastructure), fine-tuning on de-identified medical text, HIPAA-compliant vector database deployment, and domain expert evaluation loops.
Should discuss switching to a multilingual embedding model (multilingual-e5-large, BGE-M3), evaluating on machine-translated query sets, cross-lingual retrieval without parallel data, and progressive language expansion.
Covers index refresh lag, real-time vs. batch embedding pipelines, incremental indexing strategies, cache invalidation, and monitoring freshness metrics.
Should cover multimodal embedding models (CLIP, SigLIP), unified vector space for text and images, cross-modal retrieval, and integration into the existing search pipeline.
Covers model distillation, quantization (ONNX, TensorRT), batch inference optimization, knowledge distillation from large to small model, and tiered retrieval with re-ranking.
Should discuss multi-region replication, graceful degradation to keyword search fallback, health checks, circuit breakers, and disaster recovery runbooks.
Covers cost implications of massive context, latency of processing long contexts, retrieval precision vs. context stuffing, the needle-in-a-haystack problem at scale, and how search quality still matters for grounding.
AI Workflow & Tools
10 questionsShould reference EnsembleRetriever, BM25Retriever, VectorStoreRetriever, CrossEncoderReranker from langchain.retrievers, and the chain construction with retrieval + reranking + LLM.
Covers InputExample format, triplet or contrastive loss, SentenceTransformer.fit(), evaluation with InformationRetrievalEvaluator, and saving/deploying the fine-tuned model.
Should cover running mteb run command, interpreting retrieval task scores, building custom BEIR-format datasets, comparing models on domain-relevant tasks, and tracking results in W&B.
Covers creating namespaces per tenant, using metadata dictionaries during upsert and query, combining filter expressions with vector similarity, and index management.
Should cover preparing evaluation datasets with ground truth, running faithfulness/relevancy/context precision metrics, interpreting scores, and iterating on the pipeline based on results.
Covers FastAPI endpoints for single and batch embedding, Redis or in-memory caching for repeated texts, Prometheus metrics for latency and throughput, and Docker/K8s deployment.
Should discuss SentenceSplitter, document metadata preservation, VectorStoreIndex.from_documents with persist, and incremental ingestion with docstore deduplication.
Covers the hybrid search API, alpha parameter (0 = pure BM25, 1 = pure vector), experimentation methodology for tuning alpha, and combining with filters.
Should explain Matryoshka Representation Learning (MRL), truncating embedding dimensions (e.g., 256 vs. 3072), cost/quality tradeoffs, and where each dimensionality is appropriate.
Covers blue-green or canary deployment of new embedding models, shadow indexing new vectors alongside old, A/B comparison, monitoring retrieval metrics, and automated rollback triggers.
Behavioral
5 questionsShould demonstrate ability to use analogies, focus on business impact rather than technical details, and confirm understanding through follow-up questions.
Should show intellectual humility, systematic debugging approach, willingness to iterate, and a concrete lesson learned about retrieval system design.
Should reference impact analysis, user-facing metrics, stakeholder alignment, and a framework for making tradeoff decisions (e.g., ICE scoring or effort-impact matrix).
Should demonstrate tactful communication, data-driven approach (showing metrics rather than opinions), collaborative framing, and focus on the shared goal of system quality.
Should mention specific sources (arXiv, Twitter/X ML community, HuggingFace blog, conference papers), and give a concrete example of adopting a new technique or tool.