AI Plugin Developer
An AI Plugin Developer designs, builds, and maintains software extensions that integrate large language models and AI services int…
Skill Guide
A set of software design patterns and resilience engineering practices specifically applied to handle the inherent non-determinism, latency, and failure modes of Large Language Model (LLM) API calls.
Scenario
You are tasked with creating a Python class that wraps the OpenAI API. It must handle common errors (rate limits, timeouts, server errors) and return a default 'service unavailable' message if all retries fail.
Scenario
Your application uses a primary LLM (e.g., GPT-4) for high-quality outputs but needs to degrade gracefully. You must also prevent cascading failures if the provider's service becomes completely unavailable.
Scenario
You lead a team with a critical, multi-step LLM pipeline (e.g., for legal document analysis). You must prove its resilience before a major launch by systematically injecting faults.
`tenacity` is the industry standard for implementing robust retry logic with decorators. `pybreaker` provides the circuit breaker pattern to fail fast. `OpenTelemetry` is for distributed tracing of LLM call chains. `Prometheus` (metrics) + `Grafana` (dashboards) are used to monitor error rates and latency SLOs.
Cloud API gateways can handle initial retries and rate limiting at the edge. A service mesh like Istio enables advanced fault injection and retries at the infrastructure level. Serverless functions are common deployment targets that require careful timeout and error handling configuration.
Answer Strategy
The candidate must demonstrate a systematic, tiered approach beyond simple retries. A strong answer outlines specific layers: 1) Infrastructure Layer (circuit breaker, timeout controls), 2) Retry Layer (exponential backoff with jitter, retry-specific error codes), 3) Fallback Layer (tiered model downgrade, cached responses, human-in-the-loop escalation), and 4) User Experience Layer (clear status communication, offline mode). The strategy should be justified by error type (e.g., don't retry a 401, but do retry a 429).
Answer Strategy
Tests debugging skills and understanding of non-idempotent operations. The answer should identify two potential issues: 1) Lack of jitter causing thundering herd (all clients retrying at once), and 2) Retrying requests that are not idempotent (the model actually processed the first request but the client didn't get the response). The fix involves adding jitter to backoff and implementing idempotency keys (e.g., a unique request ID sent to the API) to prevent duplicate work. Diagnosis would involve analyzing retry logs and tracing requests.
1 career found
Try a different search term.