Skill Guide

Deep learning for recommendation: two-tower models, transformers, sequential recommendation

A specialized sub-domain of machine learning that applies neural network architectures-specifically two-tower models for scalable retrieval, transformers for context-aware ranking, and sequential models for session-based prediction-to solve the core problems of candidate generation, ranking, and next-item prediction in recommendation systems.

These architectures enable organizations to dramatically improve key business metrics like click-through rate (CTR), user engagement, and lifetime value by providing highly relevant, personalized experiences at massive scale. They are the core engine driving monetization and user retention in platforms like e-commerce, streaming, and social media.

1 Careers

1 Categories

9.1 Avg Demand

15% Avg AI Risk

How to Learn Deep learning for recommendation: two-tower models, transformers, sequential recommendation

1. Master the fundamentals of collaborative filtering (user-item matrix factorization) and content-based filtering to understand the baseline. 2. Grasp the core concept of embeddings (user and item vectors) and loss functions like contrastive loss (e.g., BPR, Triplet Loss) which are the foundation of modern models. 3. Implement a basic two-tower retrieval model using a framework like TensorFlow Recommenders (TFRS) on a public dataset (e.g., MovieLens) to solidify the concept of separate user and item encoders.

1. Move beyond retrieval to ranking: Implement a Deep & Cross Network (DCN) or a Wide & Deep model to understand feature interaction and multi-task learning for CTR prediction. 2. For sequential recommendation, implement a GRU4Rec or SASRec model on session-based data (e.g., RSC19) to learn the difference between sequential patterns and static user profiles. 3. Critical Mistake to Avoid: Ignoring the 'candidate generation' vs 'ranking' pipeline distinction. Never use a complex transformer model to scan the entire item catalog; it is computationally infeasible. Use a fast two-tower model for retrieval and a complex model for fine-ranking.

1. Architect end-to-end systems that integrate retrieval (two-tower), coarse-ranking, fine-ranking (transformers), and re-ranking (business rules, diversity) layers, optimizing for multiple objectives (engagement, revenue, diversity). 2. Master the deployment of these models at scale, understanding trade-offs between model complexity (e.g., a full transformer vs. a distilled version) and serving latency (p99). 3. Lead A/B testing strategy to measure the causal impact of model changes on long-term user behavior, not just short-term CTR, and mentor teams on interpreting these complex experiments.

Practice Projects

Beginner

Project

Build a Movie Retrieval System with Two-Tower Model

Scenario

You have the MovieLens-1M dataset. Your goal is to build a system that, given a user, retrieves the top 10 most relevant movies from a catalog of ~3,700.

How to Execute

1. Use TensorFlow Recommenders (TFRS) to define a `UserTower` (MLP processing user features like age, gender, history) and a `MovieTower` (MLP processing movie features like genre, title). 2. Train the model using the `tfrs.tasks.Retrieval` task with a dot-product interaction and in-batch negative sampling. 3. After training, index all movie embeddings using the `tfrs.layers.ann.BruteForce` layer for approximate nearest neighbor (ANN) retrieval. 4. Query the index with a test user embedding to retrieve and evaluate the top-K results using metrics like Recall@K.

Intermediate

Project

Implement a Session-Based Recommender with Transformers

Scenario

You are working on an e-commerce site's 'Continue Shopping' feature. Users are anonymous; you only have their current clickstream session (sequence of product views). You need to predict the next item they will interact with.

How to Execute

1. Process clickstream data into sequences of fixed length (e.g., last 20 items). Pad shorter sequences. 2. Implement a Transformer-based architecture like SASRec (Self-Attentive Sequential Recommendation) in PyTorch. This uses a single Transformer encoder block on the sequence of item embeddings. 3. Train the model with a next-item prediction objective, masking future items in the sequence. 4. Evaluate using Hit Rate@10 and Mean Reciprocal Rank (MRR) on a held-out test set, comparing against a baseline like GRU4Rec.

Advanced

Project

Design and Critique a Multi-Stage Recommendation Pipeline

Scenario

You are the lead ML engineer for a video streaming platform with 50M daily active users and a catalog of 100K videos. The current system is a single monolithic model with 200ms p99 latency. You must redesign it for scalability and performance.

How to Execute

1. Architect a 4-stage pipeline: a) Candidate Generation (Two-Tower model with ANN serving via ScaNN or Faiss, 10ms latency). b) Coarse Ranking (Lightweight DCN, scores 500 candidates, 20ms). c) Fine Ranking (Large Transformer-based model with user context, scores 50 candidates, 50ms). d) Re-Ranking (Rule-based business logic for diversity, freshness, promotions). 2. Model the data flow and feature stores required for each stage, emphasizing the use of pre-computed embeddings and real-time feature joins. 3. Define a comprehensive offline/online evaluation strategy, including A/B testing the full pipeline vs. the monolith, measuring latency, throughput, and business metrics like watch time and subscription retention.

Tools & Frameworks

Deep Learning Frameworks

PyTorchTensorFlow/KerasTensorFlow Recommenders (TFRS)

PyTorch is the dominant framework for research and custom model implementation (SASRec, custom transformers). TensorFlow/TFRS provides high-level, production-ready abstractions specifically for building and serving two-tower and ranking models at scale.

Serving & Infrastructure

TensorFlow Serving (TFX)Triton Inference Server (NVIDIA)Google Vertex AIAWS SageMaker

For deploying models to production with low latency, high throughput, and features like A/B testing, model versioning, and GPU optimization. Triton is particularly important for high-performance serving of complex transformer models.

Approximate Nearest Neighbor (ANN) Libraries

Faiss (Facebook)ScaNN (Google)Annoy (Spotify)

Essential for the retrieval stage. These libraries create and query indexes of billions of embeddings in milliseconds, making two-tower models viable for large-scale applications. Faiss is the industry standard for its performance and flexibility.

Data & Feature Engineering

Apache Beam / Google DataflowFeast (Feature Store)BigQuery / Apache Spark

Beam/Dataflow for creating massive-scale data pipelines for training data. Feast or a similar feature store is critical for ensuring consistency of features between training and real-time serving. Spark is used for large-scale data preprocessing and feature computation.

Interview Questions

Answer Strategy

Structure your answer by separating the problem into retrieval (how to find candidates fast) and ranking (how to order them). Explain why you cannot use a complex model for retrieval. Sample Answer: 'First, for retrieval, I'd use a two-tower model. The item tower processes item features (image, text, category) to produce a dense embedding. The query tower would be minimal, often just the target item's embedding. I'd pre-compute and index all 100M item embeddings using Faiss for sub-millisecond retrieval of the top 1,000 most similar items. Then, for ranking, I'd apply a more expressive model like a Transformer that can incorporate the user's current context and session history to re-rank those 1,000 candidates, optimizing for a blend of similarity and predicted click-through probability. This two-stage approach balances quality with the computational constraints of serving at scale.'

Answer Strategy

This tests your understanding of offline-online discrepancy and business impact. The core competency is systems thinking and considering side effects. Sample Answer: 'The offline-online gap suggests several issues. First, the offline metric (hit rate) might not correlate with online business goals; the model may be optimizing for easy, low-value clicks. Second, the model could be causing a popularity bias feedback loop, over-recommending already popular items, which hurts discovery and long-term catalog health. The revenue drop is a red flag for this. Third, there might be a latency issue; the new model is likely slower, which can degrade user experience and offset quality gains. My next steps would be to analyze the A/B test for shifts in recommendation diversity, investigate the p99 latency of the new model, and define a new primary online metric that balances engagement with business value.'