Prompt Engineer
Prompt Engineers design, test, and optimize natural-language instructions that control large language models (LLMs) and multimodal…
Skill Guide
Retrieval-Augmented Generation (RAG) is an architecture that dynamically retrieves relevant external knowledge from a vector database and injects it as context into a large language model (LLM) prompt to ground its generation in factual, up-to-date information.
Scenario
Create a chatbot that can accurately answer questions about a single PDF document (e.g., your resume, a product manual) using only its content.
Scenario
Build a RAG system for a complex domain (e.g., legal contracts, medical research papers) where answer accuracy is critical and requires multiple source synthesis.
Scenario
Architect a platform that ingests from live databases, APIs, and documents, with strict access controls, real-time updates, and auditable answers.
Provides abstractions for the entire RAG pipeline (loaders, splitters, retrievers, chains). Use LangChain for flexibility and large ecosystem, LlamaIndex for data-centric indexing and advanced retrieval patterns, Haystack for production-ready pipelines with deep Elasticsearch integration.
Store and efficiently query embedding vectors. Use Pinecone for serverless, scalable ops. Use Qdrant/Weaviate for self-hosted, high-performance needs with advanced filtering. Use FAISS/Chroma for prototyping or embedded use cases where a separate DB is overhead.
Transform text into dense vector representations. Use OpenAI/Cohere for highest performance with API access. Use local models (sentence-transformers, BGE) for cost control, data privacy, and full pipeline ownership. Model choice directly impacts retrieval quality and latency.
Measure and monitor RAG performance. RAGAS provides metrics for faithfulness, relevance, and context precision. LangSmith/Phoenix offer tracing, logging, and playgrounds to debug retrieval steps. Use these to move from 'it works' to 'it works reliably and measurably'.
Answer Strategy
The answer must decouple retrieval from generation issues. Strategy: 1) Check the retrieval quality first-log the top-K chunks and score their relevance to the query (are the *right* chunks being pulled?). 2) If retrieval is good, analyze the prompt template and context injection-is the context formatted clearly, is there too much noise, are instructions precise? 3) Examine the LLM's behavior-is it ignoring context (hallucinating), summarizing poorly, or failing at synthesis? The solution often lies in better prompt engineering (e.g., chain-of-thought, explicit citation instructions) or a re-ranking/filtering step on retrieved chunks.
Answer Strategy
Tests system design thinking and understanding of trade-offs. Strategy: Discuss a tiered approach. 1) For freshness, implement a streaming pipeline that processes document updates incrementally, not re-embedding everything. 2) For cost, use a smaller, local embedding model for the bulk initial load, and a high-quality API model for critical queries. 3) For latency, pre-compute and cache embeddings for common query patterns. 4) Chunking strategy should be document-type aware: semantic chunking for narratives, fixed-size for code/tables. Metadata (source, timestamp) must be stored and used for filtering at retrieval time.
1 career found
Try a different search term.