AI Carrier Selection Specialist
An AI Carrier Selection Specialist leverages artificial intelligence and advanced analytics to optimize logistics carrier choices,…
Skill Guide
Real-time Data Processing is the continuous ingestion, transformation, and analysis of data streams with latency measured in milliseconds to seconds, enabling immediate action.
Scenario
Build a system to process a stream of website click events to compute the top 10 most visited pages in a sliding 5-minute window, updated every minute.
Scenario
Design a streaming pipeline to flag potentially fraudulent credit card transactions by detecting a user's transaction velocity (e.g., >3 transactions in 2 minutes) across multiple merchants.
Scenario
Architect a system for an e-commerce company that provides both real-time inventory dashboards (streaming) and nightly batch analytics for business reporting, ensuring data consistency between the two.
The backbone for data ingestion and buffering. Use Kafka for high-throughput, durable messaging; Pulsar for multi-tenancy and geo-replication; Kinesis for fully managed integration within the AWS ecosystem.
For stateful computation. Use Flink for low-latency, high-throughput stateful processing with advanced windowing; Kafka Streams for lightweight processing co-located with the Kafka client; Spark Structured Streaming for micro-batch processing that integrates with the Spark ecosystem.
For managing application state or serving results. Use RocksDB for large, embedded state in Flink jobs; Redis for sub-millisecond latency on pre-aggregated results; Druid for OLAP queries on real-time data slices.
Answer Strategy
Test understanding of event time, watermarks, and windowing mechanics. Use the framework of watermarks to bound lateness and allowed lateness to handle stragglers. 'First, I would configure the system to use event time, not ingestion time. I would set a watermark, say 10 minutes behind the maximum observed event time, to trigger window computation. To handle data arriving after the watermark, I would use allowed lateness (e.g., 1 hour) to keep the window state open and emit an updated result. For data arriving even later, I would route it to a side output for manual review or reprocessing.'
Answer Strategy
Test operational experience and problem-solving. Focus on a systematic approach: monitoring, diagnosis, and mitigation. 'In a Kafka Streams application, we saw consumer lag spike. I first checked throughput and processing time metrics via Grafana. I identified a code change that introduced a synchronous database lookup per record, causing the bottleneck. The resolution was to refactor to a batch call or move the lookup to a side-input cache. As a longer-term fix, we increased the number of stream partitions and application instances for horizontal scaling.'
1 career found
Try a different search term.