Skill Guide

OSINT and dark-web intelligence gathering focused on AI/ML assets

The systematic process of using publicly available data and anonymous network intelligence to identify, monitor, and assess threats, leaks, and opportunities related to proprietary artificial intelligence and machine learning assets.

Organizations leverage this skill to proactively defend against intellectual property theft, model poisoning, and adversarial attacks, directly preserving competitive advantage and reducing incident response costs. It transforms security from a reactive cost center into a strategic intelligence function that informs both defensive postures and offensive research and development priorities.

1 Careers

1 Categories

9.1 Avg Demand

15% Avg AI Risk

How to Learn OSINT and dark-web intelligence gathering focused on AI/ML assets

Focus on foundational OSINT toolkits (Maltego, SpiderFoot, Shodan) for surface-web reconnaissance, understanding the structure and access protocols of the dark web (Tor, I2P), and the basic anatomy of an AI/ML asset (model weights, training data hashes, API endpoints).

Develop proficiency in scraping and analyzing dark web marketplaces and forums for leaked credentials, code repositories, and pre-trained models. Practice using machine learning techniques (e.g., NLP for forum analysis, clustering for vendor identification) to automate intelligence gathering. Common mistake: relying solely on manual browsing without developing query automation or data normalization pipelines.

Master the integration of OSINT/dark-web feeds into Security Orchestration, Automation, and Response (SOAR) platforms for real-time threat detection. Architect deception technologies (honeypots mimicking ML model endpoints) to gather adversarial intelligence. Mentor teams on legally compliant intelligence sharing frameworks like TLP (Traffic Light Protocol).

Practice Projects

Beginner

Project

Asset Exposure Mapping for a Public ML Model

Scenario

A company has released a public sentiment analysis API. You must identify any unauthorized clones, API key leaks, or discussions about its vulnerabilities on forums and paste sites.

How to Execute

1. Use Shodan and Censys to scan for exposed endpoints mimicking the API. 2. Scrape Pastebin, GitHub Gists, and code repositories for leaked API keys or model training scripts. 3. Monitor key dark-web forums (e.g., Dread) using Tor and keyword alerts for the model's name. 4. Compile a report listing exposure points with risk ratings.

Intermediate

Project

Dark-Web Vendor Profiling for Stolen ML Data

Scenario

Intelligence suggests a threat actor is selling a dataset purportedly scraped from your company's internal R&D repositories. You need to verify the claim, assess the data's validity, and identify the vendor's operational patterns.

How to Execute

1. Navigate to relevant dark-web marketplaces (using verified URLs) and identify the listing. 2. Use blockchain analysis tools (Chainalysis) to trace the vendor's cryptocurrency wallet for transaction patterns. 3. Perform textual analysis on the vendor's marketing copy and communication style to create a linguistic fingerprint. 4. Correlate the claimed data samples with internal metadata to confirm breach scope.

Advanced

Case Study/Exercise

Incident Response: Contaminated Open-Source Model

Scenario

A widely-used open-source ML library is found to have a malicious dependency that exfiltrates model architectures. Your organization has integrated it into several production systems. You must contain the threat, trace the attack vector, and implement a proactive monitoring strategy.

How to Execute

1. Isolate affected systems and audit dependency chains using SBOM (Software Bill of Materials) tools. 2. Conduct a deep dive into the library's commit history and contributor accounts to identify the point of compromise. 3. Use network traffic analysis to identify any data exfiltration to known command-and-control servers. 4. Develop a YARA rule and threat intelligence feed to monitor for the malicious code pattern across your development ecosystem and dark-web code dumps.

Tools & Frameworks

Software & Platforms

MaltegoSpiderFootShodan/CensysTor BrowserOnionScan

Core tools for automated data collection, link analysis, and dark-web infrastructure scanning. Maltego excels at visualizing relationships; SpiderFoot automates broad OSINT scans; Shodan/Censys map internet-connected assets.

Analysis & Automation Frameworks

The OSINT Framework (osintframework.com)MITRE ATLAS (Adversarial Threat Landscape for AI Systems)OWL (Ontology Web Language) for threat modeling

Structured methodologies for organizing intelligence tasks and mapping threats. MITRE ATLAS is critical for defining specific attack vectors against ML pipelines, providing a common language for defense.

Data Processing & ML Tools

Python (BeautifulSoup, Scrapy, Pandas)Neo4j for graph databasesNLP libraries (spaCy, NLTK)

Essential for building custom scrapers, normalizing disparate data sources, performing sentiment analysis on forum chatter, and mapping complex relationships between actors, assets, and vulnerabilities.

Interview Questions

Answer Strategy

The interviewer is testing your structured analytical thinking and knowledge of verification techniques. Use the Intelligence Cycle (Direction, Collection, Processing, Analysis, Dissemination). Sample Answer: 'I would start with Collection by obtaining a sample of the advertised model. In the Processing phase, I would use tools like `diff` or custom scripts to compare its architecture and weight hashes against our internal version control. For Analysis, I would examine the vendor's forum history and blockchain transactions to establish credibility and potential links. Finally, I would Disseminate findings in a report that includes IOCs and recommended containment actions, following TLP guidelines for sharing.'

Answer Strategy

This tests professional ethics and practical experience. Focus on frameworks and proactive measures. Sample Answer: 'In a previous role investigating leaked credentials, I strictly adhered to the organization's ROE (Rules of Engagement) and consulted legal counsel before any interaction. I used only passive reconnaissance techniques, never attempted unauthorized access, and all data was sanitized and stored in compliant systems. I documented every step to maintain a clear chain of custody, ensuring our actions could withstand legal scrutiny.'