AI Dark Data Analyst
An AI Dark Data Analyst specializes in discovering, cataloging, and extracting actionable intelligence from the 55-90% of enterpri…
Skill Guide
It is the systematic process of discovering, organizing, and governing enterprise data assets by defining their context (metadata) and classifying unmanaged 'dark data' into actionable taxonomies to reduce risk and unlock value.
Scenario
You have downloaded a public dataset (e.g., from Kaggle) about retail sales. The column names are cryptic (e.g., 'col_1', 'dt').
Scenario
Your company is decommissioning a legacy server containing 10TB of unclassified log files, reports, and backups from the last decade. Legal requires a data retention policy.
Scenario
Your company has acquired a competitor. Both companies have separate data catalogs, glossaries, and conflicting definitions for core entities like 'Customer' and 'Revenue'.
Use Collibra or Alation for enterprise governance with strong business glossary features. Use Apache Atlas for Hadoop/Spark ecosystem integration. Use native cloud catalogs (Dataplex, AWS Glue Catalog) for cloud-native data lake governance.
Apply DAMA-DMBOK for comprehensive governance structure. Use Data Mesh's 'Data as a Product' concept to assign domain ownership to catalog entries. Apply FAIR (Findable, Accessible, Interoperable, Reusable) to evaluate and improve the maturity of your catalog entries.
Answer Strategy
Structure the answer using a phased approach: Discovery, Triage, and Operationalize. Avoid proposing to catalog everything manually. Emphasize automated scanning, stakeholder alignment, and creating actionable policies.
Answer Strategy
The interviewer is testing stakeholder management, influence without authority, and the ability to create shared understanding. Use the STAR (Situation, Task, Action, Result) method. Focus on facilitation techniques, not just technology.
1 career found
Try a different search term.