Skip to main content

Skill Guide

Real-time and batch ML pipeline design (streaming features, model serving)

The architecture, implementation, and operation of automated systems that ingest, transform, and serve data for machine learning models in both low-latency real-time streams and high-throughput batch processes.

This skill is critical because it directly determines the operational reliability, latency, and business impact of ML models in production. It enables organizations to leverage both historical and live data for predictions, driving immediate and sustained value from ML investments.
1 Careers
1 Categories
9.1 Avg Demand
15% Avg AI Risk

How to Learn Real-time and batch ML pipeline design (streaming features, model serving)

1. Understand the core paradigms: batch (scheduled, bounded data) vs. streaming (unbounded, real-time). Learn foundational concepts like event time vs. processing time, exactly-once semantics, and windowing. 2. Master a batch processing framework like Apache Spark (PySpark). 3. Learn the basics of a feature store concept and a simple model serving framework like FastAPI or BentoML.
1. Transition from theory to practice by building a pipeline that processes a public streaming dataset (e.g., NYC taxi trips) using Apache Flink or Spark Structured Streaming. Focus on implementing late data handling and watermarking. 2. Integrate a feature store like Feast or Tecton to manage and serve features consistently for both batch training and real-time inference. 3. Deploy a model using a dedicated serving solution like Seldon Core, KServe, or MLflow, and learn to monitor for data drift and model performance decay.
1. Architect hybrid pipelines (Lambda or Kappa architecture) for complex business use cases requiring both speed and completeness (e.g., fraud detection, real-time recommendations). Design for cost efficiency and system resilience. 2. Master advanced operational patterns: feature backtesting, A/B testing in production pipelines, and automated model retraining loops. 3. Develop governance and strategy: establish SLAs for feature freshness and prediction latency, design cross-team ownership models, and mentor engineers on robust pipeline design patterns.

Practice Projects

Beginner
Project

Build a Simple Real-Time Feature Pipeline

Scenario

Create a system that consumes a live stream of simulated user click events (e.g., from a Kafka topic) and computes a rolling 5-minute count of clicks per user as a feature.

How to Execute
1. Set up a local Kafka instance and a Python producer that simulates click events with `user_id` and `timestamp`. 2. Use Apache Flink (PyFlink) or Spark Structured Streaming to consume the stream, define a 5-minute sliding window, and aggregate clicks per user. 3. Expose the latest computed feature value via a simple HTTP endpoint (e.g., using Flask) for a mock model to consume. 4. Log the latency from event ingestion to feature availability.
Intermediate
Project

End-to-End Feature Store Implementation for a Hybrid Pipeline

Scenario

You have historical daily sales data in a data warehouse (batch) and a real-time stream of web traffic. Build a pipeline to predict next-day product sales using features from both sources.

How to Execute
1. Use dbt or Spark to create batch features (e.g., 7-day rolling average sales) and register them in a feature store (e.g., Feast). 2. Build a streaming pipeline with Flink/Spark Streaming to compute real-time features (e.g., current-hour page views) and publish them to the same feature store's online storage. 3. Train a regression model using the batch feature views. 4. Deploy a model server (e.g., Seldon) that, for each prediction request, fetches the latest real-time feature AND the relevant batch feature from the feature store to make a prediction.
Advanced
Project

Design a Fault-Tolerant, Low-Latency Feature Serving System

Scenario

Architect a system for a real-time bidding (RTB) ad platform that must compute and serve 50+ user and context features within 10 milliseconds P99 latency for 100k QPS.

How to Execute
1. Design a dual-path architecture: a primary path using a high-performance streaming engine (e.g., Apache Flink) to compute and update features, with output to a low-latency key-value store (e.g., Redis, DynamoDB). 2. Implement a secondary, slower batch path for feature computation validation and backfilling. 3. Build a feature serving microservice in a language like Go or Rust that reads from the KV store, implements a robust caching strategy, and handles failover to default features. 4. Establish rigorous monitoring on feature freshness, serving latency percentiles, and error rates, with automated rollback procedures.

Tools & Frameworks

Stream & Batch Processing Engines

Apache FlinkApache Spark (Structured Streaming)Apache Beam

Flink is the industry standard for low-latency, stateful stream processing. Spark is dominant for large-scale batch and micro-batch processing. Beam provides a unified programming model that can run on multiple backends (Flink, Spark, Dataflow). Choose Flink for sub-second latency requirements; choose Spark for massive batch ETL and when your team has strong JVM/Scala skills.

Feature Stores & Management

FeastTectonHopsworks

These systems solve the 'training-serving skew' problem by providing a central registry for feature definitions and managing the materialization of features into online stores for low-latency serving. Feast is open-source and modular; Tecton is a managed platform with sophisticated feature transformation orchestration; Hopsworks integrates tightly with its own data platform.

Model Serving & Deployment

Seldon CoreKServe (formerly KFServing)BentoMLTorchServeTriton Inference Server

These tools standardize model packaging, deployment, and scaling on Kubernetes. Seldon and KServe offer advanced capabilities like canary rollouts, explainers, and outlier detection. BentoML focuses on developer experience for packaging models as production-ready services. Triton is optimized for high-performance GPU inference of deep learning models.

Orchestration & Infrastructure

KubernetesApache AirflowDagsterTerraform

Kubernetes is the foundational platform for containerized, scalable pipeline components. Airflow/Dagster orchestrate complex, dependency-aware workflows for batch pipelines and feature materialization. Terraform manages the underlying cloud infrastructure (compute, storage, networking) as code for reproducibility.

Interview Questions

Answer Strategy

The candidate must demonstrate deep architectural understanding, not just definitions. A strong answer will contrast the complexity of maintaining two codebases (Lambda) vs. the requirement for a highly sophisticated, replayable streaming engine (Kappa). The business scenario is key: e.g., 'I'd choose Lambda for a financial risk model where auditability and complete reprocessing from raw data are non-negotiable, despite the operational complexity. For a real-time content recommendation system where simplicity and low-latency are paramount, Kappa with a robust stream processor like Flink is superior.'

Answer Strategy

This tests operational rigor. The answer should follow a logical, layered approach: 1) Check monitoring dashboards for bottlenecks (ingestion rate, processing latency, sink write times). 2) Inspect the streaming job for data skew, slow external calls, or state size growth. 3) Validate the downstream feature store/serving system's performance and health. 4) Examine the source data for volume spikes or schema changes. A sample answer: 'I would first isolate the bottleneck layer by checking metrics for the source connector, the processing job, and the feature store. If the processing job's latency is high, I'd analyze the job's Flink/Spark UI for backpressure, skewed watermarks, or state backend issues. Simultaneously, I'd verify the feature store's read latency hasn't degraded.'

Careers That Require Real-time and batch ML pipeline design (streaming features, model serving)

1 career found