AI Customer Risk Analyst
An AI Customer Risk Analyst leverages artificial intelligence and advanced analytics to identify, quantify, and mitigate financial…
Skill Guide
SQL & Python for Data Wrangling is the systematic practice of using SQL for data extraction and transformation within databases, and Python (primarily with Pandas) for complex cleaning, reshaping, and integration of datasets to produce analysis-ready data.
Scenario
You are given a raw CSV of e-commerce orders containing missing customer IDs, inconsistent product category names, and mixed date formats. Your goal is to produce a clean, deduplicated dataset ready for a sales analysis dashboard.
Scenario
You have access to a database of raw user clickstream events (user_id, timestamp, page_url, event_type). The business requires a weekly report on conversion rates through a key product funnel (e.g., Home -> Product Page -> Add to Cart -> Purchase).
Scenario
Your team repeatedly processes data from three different CRM sources with slightly different schemas. The ad-hoc scripts are becoming unmaintainable. You must build a reusable Python module.
Pandas is the core library for in-memory data manipulation. SQLAlchemy provides a Pythonic interface to SQL databases. DBT is for version-controlled SQL transformations in a data warehouse. PySpark is used for wrangling datasets too large for Pandas. JupyterLab is the primary interactive development environment for iterative wrangling.
Window functions (ROW_NUMBER, RANK, LAG/LEAD) are essential for complex SQL logic without self-joins. Vectorization avoids slow Python loops. Idempotency ensures scripts can be re-run safely. Understanding normalization helps design clean source schemas. Choosing ETL vs. ELT dictates where primary transformation work occurs (in Python vs. in the database).
1 career found
Try a different search term.