RAG Engineer
A RAG Engineer designs and builds Retrieval-Augmented Generation pipelines that ground large language model outputs in authoritati…
Skill Guide
The engineering discipline of designing, deploying, and optimizing systems that store high-dimensional vector embeddings and perform similarity searches, often combined with traditional metadata filtering, to enable semantic retrieval in applications like RAG, recommendation, and anomaly detection.
Scenario
Create a web app where users upload a query image and receive visually similar images from a pre-indexed gallery (e.g., a subset of ImageNet or your own photo collection).
Scenario
Enhance an e-commerce search where users can find products by both semantic description ('a comfortable chair for long gaming sessions') and filters ('brand: SteelSeries', 'price < 500').
Scenario
Build a Retrieval-Augmented Generation system for a technical knowledge base that ingests PDFs (text + figures), supports queries across both modalities, and uses a cross-encoder to rerank top results for the LLM.
Use managed services (Pinecone) for rapid prototyping and scale without ops overhead. Choose open-source solutions (Weaviate, Qdrant, Milvus) for maximum control, on-prem deployment, or advanced hybrid search features. pgvector is ideal when vector search is an extension of an existing relational workload.
Select models based on domain, language support, and cost. Sentence-Transformers offers a wide range of open-source models for fine-tuning. Use CLIP for multi-modal (text-image) tasks. Commercial APIs (OpenAI, Cohere) offer high quality with less maintenance but higher recurring cost and data egress concerns.
These frameworks provide abstractions to orchestrate the RAG pipeline: loading data, chunking text, calling embedding models, interacting with vector databases, and interfacing with LLMs. They accelerate development but require understanding the underlying components for debugging and optimization.
Use ANN-Benchmarks to compare algorithmic performance. VectorDBBench compares real database solutions. MTEB benchmarks embedding model quality. RAGAS and DeepEval are for end-to-end RAG pipeline evaluation, measuring retrieval relevance, answer faithfulness, and other critical metrics.
Answer Strategy
Demonstrate knowledge of trade-offs and benchmarking. Start by stating HNSW is likely the default choice for this scale and latency requirement. Explain key parameters: `M` (connections per node) for memory/recall, `efConstruction` (build quality), and `efSearch` (query quality). Emphasize the need to benchmark with actual data, tuning `efSearch` to hit the latency target while monitoring recall. Mention that IVF-PQ could be considered if memory is extremely constrained, but at the cost of higher latency and lower recall. Sample Answer: "For 100M vectors at 768 dimensions with a 50ms p99 SLA, I would start with HNSW. I'd set `M` to 16-32 to balance memory and graph connectivity, and `efConstruction` to 100-200 for a high-quality build. The critical runtime parameter is `efSearch`, which I'd tune starting from 50, incrementally increasing it until recall@10 stabilizes above our target (e.g., 0.95) while consistently meeting the latency SLA. I'd use a subset of data for initial tuning and then validate on the full set."
Answer Strategy
Test problem-solving and real-world experience. Structure the answer: Context (what was the application), Problem (how vector search alone was insufficient), Solution (how you integrated filters/hybrid search), and Impact. Sample Answer: "In a B2B recommendation engine, vector search for 'similar companies' was returning matches based on industry keywords, but ignored our users' need to filter by company size and geography. Pure vector search was retrieving large multinationals for a startup user. I implemented a hybrid search strategy in Weaviate, combining vector similarity with pre-filtering on the structured metadata fields (`employee_count`, `country`). This allowed the core semantic ranking to operate within the user's target segment, improving click-through rate by 35% because the results were now both semantically relevant and contextually appropriate."
1 career found
Try a different search term.