AI Streaming Data Engineer
An AI Streaming Data Engineer designs, builds, and maintains the real-time data pipelines that fuel modern AI systems, transformin…
Skill Guide
The deliberate structuring of data entities, their relationships, and physical storage layout to minimize read/write latency in distributed systems optimized for high-throughput, low-latency access patterns.
Scenario
You need to store user clickstream data (user_id, timestamp, event_type, page_url) to support the query: 'Get all events for a user in the last 24 hours, sorted by time.'
Scenario
An online store's product detail page (PDP) loads slowly. Product data includes ID, name, price, inventory count, and 20 attributes. The goal is sub-10ms read latency.
Scenario
A fintech company needs to evaluate transactions in <50ms. The model must combine real-time transaction features (Redis) with historical user patterns (Cassandra) and batch risk scores (S3/Data Lake).
Select based on access pattern: use Cassandra for time-series and wide-column needs; Redis for ephemeral, in-memory caching and counters; DynamoDB for fully managed, auto-scaling key-value access.
CQL is essential for defining Cassandra tables with partition/clustering keys. Redis requires choosing the right data structure per operation (e.g., Sorted Sets for leaderboards). DynamoDB design forces thinking in terms of primary keys and GSIs for a single table.
Use JMeter/redis-benchmark for load testing under expected peak QPS. Cassandra-stress simulates production-like workloads. Monitor P99 latency, cache hit ratios, and storage read IOPS with Prometheus/Grafana.
Answer Strategy
Focus on query-driven design and denormalization. Explain the fan-out-on-write approach. Sample Answer: 'I'd create a `feed_by_user` table partitioned by `user_id` with a clustering key of `post_timestamp` DESC. When a user posts, I'd fan-out the post to all follower's partitions in a batch. This pre-computes each user's feed for O(1) reads, trading write amplification for read latency.'
Answer Strategy
Tests architectural judgment and understanding of business requirements. Sample Answer: 'For a shopping cart service, I chose eventual consistency with DynamoDB's `Eventually Consistent` reads for product inventory checks to meet 5ms SLOs. This allowed slightly stale inventory data but prevented checkout latency. We reconciled conflicts with a retry idempotency key on the order service.'
1 career found
Try a different search term.