AI Data Lineage Analyst
An AI Data Lineage Analyst maps, monitors, and audits the complete lifecycle of data as it flows through AI and machine learning p…
Skill Guide
The design, implementation, and security analysis of multi-cloud or single-cloud data ecosystems (AWS, GCP, Azure), where every data flow and access point is governed by Identity and Access Management (IAM) policies to enforce least privilege, traceability, and compliance.
Scenario
Build a pipeline that ingests JSON files from an S3 source bucket into a processed data lake partitioned by date in a target S3 bucket, using AWS Glue.
Scenario
Three business units (Marketing, Finance, Analytics) need shared, read-only access to a Snowflake data warehouse hosted on Azure, but with strict row-level security (RLS) based on department tags. You must design the IAM and access model.
Scenario
A data product team in AWS (Athena) needs to securely and performantly query a curated dataset in GCP (BigQuery) without data replication, governed by a central data platform team's policies.
Used for policy validation, simulation, and identifying overly permissive access. Critical for daily operations and policy refinement.
Terraform/Pulumi codifies cloud infrastructure including IAM roles and policies. OPA/Sentinel enforces custom security and compliance rules on IaC plans before deployment.
Ranger/Atlas manage fine-grained access control and metadata for on-prem or cloud data lakes. Commercial catalogs and cloud-native services provide enterprise-grade data discovery, classification, and policy management.
Essential for creating clear, standardized architecture diagrams that map data flows and associated IAM roles, which is a core deliverable of this skill.
Answer Strategy
Test the candidate's understanding of public exposure risk, root cause analysis, and incident response. **Answer Strategy:** 1. State the risk: The bucket is publicly readable to any AWS account. 2. Root cause: Likely a misconfiguration, possibly from a tutorial or a misapplied Terraform template. 3. Remediation: a) Immediately remove the wildcard policy. b) Audit all access logs for the bucket for any data exfiltration. c) Implement AWS Config rules or IAM Access Analyzer to detect future misconfigurations. d) Review the change control process that allowed this.
Answer Strategy
Tests architectural thinking and stakeholder management. **Competency Tested:** Designing for scalability and least privilege. **Sample Response:** 'For a pipeline feeding PII to our marketing platform, I implemented a role-based access control (RBAC) model with data steward roles. I created a central service account with read access to the raw data lake, but applied column-level masking in our transformation layer (e.g., DBT). Each consuming team had a distinct role (e.g., `marketing_analyst`) that only granted access to anonymized or necessary columns. I automated role provisioning via Azure AD group sync, reducing access request tickets by 70% while maintaining strict least privilege.'
1 career found
Try a different search term.