Skill Guide

Cloud data platform architecture (AWS, GCP, Azure) and IAM-aware data flow analysis

The design, implementation, and security analysis of multi-cloud or single-cloud data ecosystems (AWS, GCP, Azure), where every data flow and access point is governed by Identity and Access Management (IAM) policies to enforce least privilege, traceability, and compliance.

This skill directly reduces data breach risk and compliance violations by ensuring that data pipelines are architecturally sound and that access to sensitive data is explicitly controlled, audited, and justified. It transforms data platforms from cost centers into secure, scalable, and trustworthy business assets, enabling safe data monetization and advanced analytics.

1 Careers

1 Categories

8.7 Avg Demand

18% Avg AI Risk

How to Learn Cloud data platform architecture (AWS, GCP, Azure) and IAM-aware data flow analysis

1. **Core Cloud Concepts:** Master the fundamental services for storage (S3, Blob Storage, Cloud Storage), compute (EC2, VMs, Cloud Functions), and data warehousing (Redshift, BigQuery, Synapse) on one primary cloud. 2. **IAM Fundamentals:** Deeply understand users, roles, service accounts, and policies. Practice the principle of least privilege by creating roles that grant only the permissions needed for a specific task. 3. **Data Flow Diagramming:** Learn to map data movement using tools like Lucidchart or diagrams.net, clearly labeling each transfer protocol (API, SQS, Pub/Sub) and the IAM entity (role/user) executing it.

1. **Cross-Service & Cross-Account Patterns:** Architect solutions where data moves securely between accounts or projects (e.g., AWS S3 bucket policy granting access to another AWS account's role). Implement private connectivity (VPC Endpoints, Private Service Connect). 2. **IAM Policy Analysis:** Use native tools (IAM Access Analyzer, Policy Simulator) and third-party scanners to audit policies for overly permissive access (`*` resources, `*` actions). 3. **Common Pitfall:** Avoid static, long-lived credentials for service-to-service communication; implement workload identity federation or native managed identities instead.

1. **Multi-Cloud & Hybrid Strategy:** Design and govern data flows that span AWS, GCP, and Azure, using consistent IAM abstraction layers (e.g., identity federation across clouds) and secure networking (Interconnect, ExpressRoute). 2. **Policy-as-Code (PaC):** Implement and enforce IAM and network policies programmatically using tools like Open Policy Agent (OPA), HashiCorp Sentinel, or AWS Config rules within CI/CD pipelines. 3. **Mentorship & Governance:** Lead the creation of organizational guardrails (landing zones, approved service lists) and train data engineers on secure data flow design patterns.

Practice Projects

Beginner

Project

Secure Data Lake Ingestion Pipeline

Scenario

Build a pipeline that ingests JSON files from an S3 source bucket into a processed data lake partitioned by date in a target S3 bucket, using AWS Glue.

How to Execute

1. Create an IAM Role for AWS Glue with permissions ONLY to read from the source bucket and write to the target bucket. 2. Define an AWS Glue Crawler and Job that uses this role to transform and move the data. 3. Write an IAM Resource Policy on the source bucket that explicitly allows the Glue Role's ARN to perform `s3:GetObject`. 4. Document the end-to-end data flow, including the role assumption chain.

Intermediate

Case Study/Exercise

Multi-Team Data Warehouse Access Governance

Scenario

Three business units (Marketing, Finance, Analytics) need shared, read-only access to a Snowflake data warehouse hosted on Azure, but with strict row-level security (RLS) based on department tags. You must design the IAM and access model.

How to Execute

1. Design a Snowflake access control model using roles (`MARKETING_ANALYST`, `FINANCE_ANALYST`) and a base role (`DATA_READER`). 2. Implement dynamic data masking and RLS policies in Snowflake that filter data based on the user's role. 3. Configure Azure AD SSO integration for authentication and map Azure AD groups to Snowflake roles for automated provisioning/de-provisioning. 4. Create an audit query to report on who accessed what data and when, and simulate a user change request to test the process.

Advanced

Project

Cross-Cloud Data Mesh Federated Query Governance

Scenario

A data product team in AWS (Athena) needs to securely and performantly query a curated dataset in GCP (BigQuery) without data replication, governed by a central data platform team's policies.

How to Execute

1. Architect a solution using BigQuery's Omni (cross-cloud) or a third-party federated query engine like Starburst Galaxy. 2. Implement a cross-cloud identity: Use GCP Workload Identity Federation to allow an AWS IAM Role to be trusted as a GCP service account. 3. Define fine-grained BigQuery column and row-level security policies that are evaluated when the federated identity accesses the data. 4. Build a Terraform/Pulumi module that codifies the entire setup (IAM bindings, dataset permissions, network rules) and integrate it into the data product's CI/CD pipeline.

Tools & Frameworks

Cloud-Native IAM & Security Tools

AWS IAM Access Analyzer & Policy SimulatorGCP IAM Recommender & Policy TroubleshooterAzure RBAC & Conditional Access

Used for policy validation, simulation, and identifying overly permissive access. Critical for daily operations and policy refinement.

Infrastructure as Code (IaC) & Policy as Code (PaC)

Terraform/PulumiOpen Policy Agent (OPA)HashiCorp Sentinel

Terraform/Pulumi codifies cloud infrastructure including IAM roles and policies. OPA/Sentinel enforces custom security and compliance rules on IaC plans before deployment.

Data Governance & Cataloging

Apache Ranger/Apache Atlas (for Hadoop ecosystem)Collibra/AlationCloud-Native Catalogs (AWS Glue Data Catalog, GCP Data Catalog, Azure Purview)

Ranger/Atlas manage fine-grained access control and metadata for on-prem or cloud data lakes. Commercial catalogs and cloud-native services provide enterprise-grade data discovery, classification, and policy management.

Diagramming & Documentation

Lucidchart/draw.ioC4 ModelMermaid.js (for docs-as-code)

Essential for creating clear, standardized architecture diagrams that map data flows and associated IAM roles, which is a core deliverable of this skill.

Interview Questions

Answer Strategy

Test the candidate's understanding of public exposure risk, root cause analysis, and incident response. **Answer Strategy:** 1. State the risk: The bucket is publicly readable to any AWS account. 2. Root cause: Likely a misconfiguration, possibly from a tutorial or a misapplied Terraform template. 3. Remediation: a) Immediately remove the wildcard policy. b) Audit all access logs for the bucket for any data exfiltration. c) Implement AWS Config rules or IAM Access Analyzer to detect future misconfigurations. d) Review the change control process that allowed this.

Answer Strategy

Tests architectural thinking and stakeholder management. **Competency Tested:** Designing for scalability and least privilege. **Sample Response:** 'For a pipeline feeding PII to our marketing platform, I implemented a role-based access control (RBAC) model with data steward roles. I created a central service account with read access to the raw data lake, but applied column-level masking in our transformation layer (e.g., DBT). Each consuming team had a distinct role (e.g., `marketing_analyst`) that only granted access to anonymized or necessary columns. I automated role provisioning via Azure AD group sync, reducing access request tickets by 70% while maintaining strict least privilege.'