Skip to main content

Skill Guide

SQL and Python for blockchain data warehouses (Dune Analytics, Flipside Crypto, The Graph)

The technical capability to query, transform, and analyze on-chain and decoded blockchain data using SQL and Python on specialized data warehouse platforms like Dune Analytics, Flipside Crypto, and The Graph.

This skill enables data-driven protocol development, treasury management, and competitive analysis by providing direct, programmatic access to the immutable ledger of blockchain activity. It transforms raw transaction logs into actionable intelligence for product growth, tokenomics design, and risk assessment, directly impacting TVL, user retention, and capital efficiency.
1 Careers
1 Categories
8.5 Avg Demand
20% Avg AI Risk

How to Learn SQL and Python for blockchain data warehouses (Dune Analytics, Flipside Crypto, The Graph)

1. Master blockchain data schemas: Learn the core tables (e.g., `ethereum.transactions`, `ethereum.logs`, `ethereum.traces`) and their decoded counterparts (e.g., `uniswap_v3.ethereum.Pair_evt_Swap`). 2. Write foundational SQL: Focus on filtering by time, aggregating by address, and joining decoded event tables to transactions. 3. Use the Dune or Flipside UI: Practice writing queries directly in the web-based SQL editor and visualizing results before touching Python.
1. Move from querying to modeling: Create materialized views (Dune) or SQL queries that build reusable datasets (e.g., daily active users, protocol revenue). 2. Integrate Python via APIs: Use the `dune-client` or `flipsidecrypto` Python libraries to programmatically run queries, fetch results, and perform advanced analysis in a Jupyter notebook. 3. Avoid common pitfalls: Don't neglect time zone consistency (UTC), misinterpret token decimals (e.g., WETH has 18), or write inefficient cross-join queries that timeout.
1. Architect data pipelines: Design end-to-end pipelines that trigger Python scripts on schedule (e.g., GitHub Actions, AWS Lambda), query Dune/Flipside, transform data with pandas/polars, and load it into a secondary database or BI tool. 2. Optimize for cost and performance: Learn to partition queries by time, use Dune's spellbook models, and leverage Flipside's incremental models to minimize credit usage. 3. Develop protocol-specific expertise: Master the decoding logic and event signatures for a major DeFi primitive (e.g., AMMs, lending protocols, derivatives) to build authoritative dashboards and analytics.

Practice Projects

Beginner
Project

Build a DEX Daily Volume Dashboard

Scenario

Create a public Dune dashboard that tracks and visualizes the daily trading volume (in USD) for Uniswap V3 on Ethereum over the last 30 days.

How to Execute
1. Write a SQL query joining `uniswap_v3.ethereum.Pair_evt_Swap` with `prices.usd` to get USD volume. 2. Use `date_trunc('day', evt_block_time)` to group by day. 3. Aggregate `amountUSD` as the volume metric. 4. Create a Dune dashboard, add a line chart widget, and configure it to use your query.
Intermediate
Project

Automated Wallet Cohort Analysis

Scenario

Identify the top 100 wallets by initial deposit amount into Aave V3 on Ethereum, then track their subsequent health factor and liquidation risk over the following 90 days using Python and Flipside.

How to Execute
1. Use the Flipside Python library to write and execute a SQL query to find the initial deposit txns and rank wallets. 2. In a Jupyter notebook, use pandas to merge this cohort list with historical `aave_v3.ethereum.Pool_evt_UserHealthFactorUpdated` event data. 3. Calculate each wallet's average health factor and flag wallets that dropped below 1.0. 4. Generate a summary report and a scatter plot of deposit size vs. health factor volatility.
Advanced
Project

Cross-Chain MEV Bot Profitability Engine

Scenario

Build a system to detect, analyze, and rank MEV bots operating across Ethereum and Arbitrum by net profitability, accounting for gas costs, token swaps, and bridge activity.

How to Execute
1. Use Dune's `dex.trades` spellbook for normalized swap data and `ethereum/arbitrum.transactions` for gas. 2. Write complex SQL to identify bot addresses via heuristics (e.g., high-frequency, specific contract interactions). 3. Use Python to programmatically run queries for both chains, merge datasets on the bot address, and calculate PnL in USD using price feeds. 4. Schedule the pipeline via GitHub Actions to run daily, storing results in a SQLite database for historical tracking.

Tools & Frameworks

Software & Platforms

Dune AnalyticsFlipside CryptoThe Graph (Subgraphs)Python (pandas, polars)Jupyter Notebook

Dune/Flipside are primary SQL-based warehouses for raw/decoded blockchain data. The Graph provides indexed, queryable data via GraphQL for specific protocols. Python is used for advanced transformation, analysis, and automation outside the SQL editor.

Python Libraries & Clients

dune-clientflipsidecryptoweb3.pyrequests

`dune-client` and `flipsidecrypto` are the official SDKs for interacting with their APIs. `web3.py` can be used for direct node interaction or verifying data. `requests` is for interacting with The Graph's hosted service endpoints.

Mental Models & Methodologies

Data Modeling (Star Schema)Time-Series AnalysisCohort AnalysisHeuristic Address Labeling

Structuring queries with fact/dimension tables (star schema) improves performance and clarity. Time-series and cohort analyses are core blockchain analytics patterns. Heuristic labeling (e.g., 'likely bot') is critical for entity-level analysis.

Interview Questions

Answer Strategy

Test the candidate's ability to reason about complex on-chain data structures (UTXO) and translate a financial concept into a multi-step SQL query. The answer should outline: 1) Joining `bitcoin.transactions` and `bitcoin.inputs`/`bitcoin.outputs` to track UTXO movement. 2) Identifying the 'last spent' timestamp for each output. 3) Joining with a historical price table (`prices.usd`) on that specific timestamp. 4) Aggregating the (value * price) sum. Mentioning performance considerations like using `DENSE_RANK()` or `WINDOW` functions is a strong signal.

Answer Strategy

Test the candidate's debugging methodology and understanding of platform constraints. The answer should demonstrate a systematic approach: checking API status/limits, analyzing query performance (data growth, lack of time filters), and implementing solutions like query optimization, pagination, or switching to an incremental model.

Careers That Require SQL and Python for blockchain data warehouses (Dune Analytics, Flipside Crypto, The Graph)

1 career found