AI Data Catalog Specialist
An AI Data Catalog Specialist designs, curates, and governs metadata-rich data catalogs that power AI and ML initiatives across th…
Skill Guide
The systematic process of identifying, categorizing, and labeling data assets based on their sensitivity and regulatory requirements, with a specific focus on detecting Personally Identifiable Information (PII) and other regulated data types.
Scenario
You are given a CSV file ('customer_data_sample.csv') with 500 rows of mock data containing mixed columns: 'full_name', 'street_address', 'notes', 'transaction_id'. Some entries in 'notes' contain inadvertently copied SSNs or medical info.
Scenario
Your team's data lake in AWS S3 is growing uncontrollably. You need to design a semi-automated process to scan new JSON files in a specific bucket daily, classify them, and apply tags for data discovery and access control.
Scenario
A multinational bank discovers during an audit that its 15-year-old core banking mainframe system has no data classification. Critical customer data (account numbers, social security numbers, transaction histories) is stored in flat files with no encryption or access logs, violating GDPR Article 5.
Enterprise platforms for automated data discovery, classification, and policy enforcement across cloud and on-prem environments. Use Purview or BigID for scanning massive data estates; use cloud-native DLP APIs for programmatic scanning within pipelines.
For building custom, lightweight, or embedded detection logic. Presidio is a leading open-source option for PII detection in text and images. Use these for specific use cases where an enterprise platform is overkill or for initial prototyping.
Frameworks that provide the 'why' and 'what' behind classification rules. Use DAMA-DMBOK to structure your data governance program. Use GDPR's Article 30 as a template for creating the inventory that classification must support.
Answer Strategy
Structure the answer around a standard data governance lifecycle: Discovery, Definition, Implementation, and Enforcement. Be specific about tools and handoffs. Sample Answer: 'First, I'd collaborate with business owners to define classification tiers and what constitutes PII based on applicable regulations. Then, I'd use a scanning tool like Purview to profile sample data from all source systems to understand the data landscape. The third step is to implement automated tagging at ingestion, integrating with the ETL/ELT pipeline, possibly using API calls to a DLP service. Finally, I'd enforce the policy by tying classification tags to access controls (RBAC) in the warehouse and establishing a quarterly review process with data stewards.'
Answer Strategy
Tests crisis management, prioritization, and understanding of the principle of least privilege. Sample Answer: 'My immediate priority is risk containment. Step 1: I would immediately engage the DBA or system owner to restrict the overly broad access permissions to the original, intended team. Step 2: I would initiate a preliminary assessment to determine the exposure window and the types of PII involved. Step 3: Based on that assessment, I would follow the company's incident response protocol, which may involve notifying the Data Protection Officer (DPO) and potentially the relevant regulator if a breach threshold is met. Concurrently, I would start the formal tagging and classification process for that database to prevent recurrence.'
1 career found
Try a different search term.