RAG Engineer
A RAG Engineer designs and builds Retrieval-Augmented Generation pipelines that ground large language model outputs in authoritati…
Skill Guide
A specialized software engineering discipline focused on designing non-blocking, concurrent systems that efficiently coordinate multiple data sources and APIs into resilient, high-throughput processing workflows.
Scenario
Build a CLI tool that concurrently fetches top headlines from 3 different news API endpoints (e.g., NewsAPI, The Guardian, NYTimes) and outputs a merged, de-duplicated list.
Scenario
Design a system that periodically checks prices for a list of products across multiple retailer APIs, calculates discounts, and sends an alert if a price drops below a threshold.
Scenario
Build a microservice that consumes a stream of user events from Kafka, enriches each event by calling 2-3 external APIs (user profile, product catalog, fraud check) in parallel, handles partial failures, and publishes enriched events to another topic.
asyncio is the foundational event loop library. aiohttp and httpx are for async HTTP client/server operations. asyncpg provides a high-performance async driver for PostgreSQL.
Modern workflow orchestrators for defining, scheduling, and monitoring complex data pipelines with dependencies, retries, and observability.
tenacity for advanced retries. circuitbreaker for implementing the circuit breaker pattern. gunicorn + uvicorn is a production-grade ASGI server setup for deploying async web services.
py-spy for low-overhead async profiling. Prometheus for metrics collection, Grafana for visualization. OpenTelemetry for distributed tracing across microservices.
Answer Strategy
Structure your answer around three pillars: concurrency, error handling, and resilience. Use asyncio.gather(return_exceptions=True) to run calls concurrently and capture exceptions. Implement a fallback (e.g., return cached data or a graceful degradation) and use tenacity for retries on the failing call. Mention timeouts and circuit breakers for long-term health.
Answer Strategy
Demonstrate a methodical, data-driven approach. First, instrument the pipeline with metrics (time per stage). Use profiling tools (py-spy, cProfile) to identify the bottleneck. For async code, check for blocking calls, insufficient concurrency, or unoptimized serialization. Then, apply targeted fixes: increase parallelism with asyncio.Semaphore, batch database writes, or use a faster library (e.g., orjson for JSON).
1 career found
Try a different search term.