Skill Guide

OSINT collection and source triage across surface web, dark web, and academic channels

The systematic process of identifying, collecting, and validating intelligence from publicly accessible data across the indexed web (surface), encrypted/unindexed networks (dark web), and peer-reviewed repositories (academic channels), followed by the prioritization of sources based on reliability, relevance, and timeliness.

This skill is highly valued because it enables organizations to proactively identify threats, competitive intelligence, and strategic opportunities using non-classified data, directly impacting risk mitigation and decision-making speed. It transforms raw, chaotic public data into actionable business intelligence, providing a critical edge in security, due diligence, and market analysis.

1 Careers

1 Categories

8.7 Avg Demand

30% Avg AI Risk

How to Learn OSINT collection and source triage across surface web, dark web, and academic channels

Focus on mastering the core OSINT lifecycle (Planning, Collection, Processing, Analysis, Dissemination) and defining source typology (primary vs. secondary, active vs. passive). Begin with the PTES Technical Guidelines for Open Source Intelligence and practice structured Google Dorking using operators like `site:`, `filetype:`, and `intitle:`. Understand the legal and ethical boundaries of collection in your jurisdiction.

Move beyond basic search to domain-specific collection: use Maltego for visual link analysis, learn dark web navigation with Tor Browser and safety protocols, and explore academic databases like Google Scholar, Scopus, and JSTOR with advanced citation searches. Avoid the mistake of collecting without a clear intelligence requirement (IR) statement, which leads to data sprawl and wasted analysis time.

Master source triage by implementing a structured grading system like the Admiralty Code (A1-F6) or the SIRI (Source Intelligence and Reliability Index) framework. Architect scalable collection pipelines using APIs (Shodan, Censys, Hunter.io) and develop automated monitoring with tools like SpiderFoot or custom Python scripts using libraries like `requests` and `BeautifulSoup`. Mentor junior analysts on cognitive biases in source evaluation and establish organizational OSINT policies.

Practice Projects

Beginner

Project

Corporate Digital Footprint Assessment

Scenario

A mid-sized company hires you to audit its public-facing digital exposure before a security audit.

How to Execute

1. Define the scope: target domain (e.g., `example.com`) and key assets (employee emails, subdomains, public documents). 2. Use surface web tools: Google Dorks for sensitive files (`site:example.com filetype:pdf`), `crt.sh` for subdomain enumeration, and Hunter.io for email pattern discovery. 3. Document findings in a structured report, categorizing each finding by source, data type, and potential risk level.

Intermediate

Case Study/Exercise

Dark Web Vendor Monitoring for Brand Protection

Scenario

A consumer electronics brand suspects its intellectual property (e.g., firmware, device schematics) is being sold on dark web marketplaces.

How to Execute

1. Acquire access to a dark web monitoring platform (e.g., DarkOwl, Recorded Future) or use manual Tor browser methods with a burner OS (Tails). 2. Define search parameters: brand name, product model numbers, relevant keywords in forum threads. 3. Triangulate findings by checking if any discovered leaks match known public technical specifications or patent filings (academic channels). 4. Prioritize sources: a verified .onion marketplace with escrow is higher priority than an anonymous forum post. Deliver a prioritized alert list to the security team.

Advanced

Project

Multi-Source Threat Intelligence Fusion for a Financial Institution

Scenario

As a threat intelligence lead, you must build a persistent collection system to monitor for credential dumps, phishing infrastructure, and emerging fraud tactics targeting the institution.

How to Execute

1. Architect the collection: Use APIs from Recorded Future, Flashpoint, and PhishTank for automated dark web and surface web feeds. Integrate academic threat reports via RSS from arXiv (cs.CR section) and USENIX Security proceedings. 2. Implement source triage: Assign a composite score to each source using a weighted matrix (e.g., reliability 30%, timeliness 30%, relevance 40%). Sources scoring below a threshold are deprioritized. 3. Automate processing: Write Python scripts to normalize data formats (STIX/TAXII where possible) and deduplicate indicators. 4. Produce actionable intelligence: Create a weekly fusion report for the CISO that correlates dark web chatter with observed phishing campaigns and academic research on similar attack patterns.

Tools & Frameworks

Software & Platforms

MaltegoShodan/CensysGoogle Dorks & advanced search operatorsTor Browser with Tails OSSpiderFootTheHarvester

Use Maltego for complex relationship mapping across surface and dark web data. Shodan/Censys are for technical infrastructure intelligence (servers, IoT). Google Dorks are the foundational surface web collection tool. Tor/Tails are essential for secure, anonymous dark web access. SpiderFoot and TheHarvester automate initial collection and enumeration phases.

Mental Models & Methodologies

The OSINT Lifecycle (Planning -> Collection -> Processing -> Analysis -> Dissemination)Admiralty Code (Source Reliability A-F, Information Credibility 1-6)SIRI FrameworkThe Intelligence Cycle

Apply the OSINT Lifecycle as your overarching project management framework. Use the Admiralty Code or SIRI to objectively triage and grade every piece of collected information before it enters analysis, preventing 'garbage in, garbage out'.

Data & Academic Channels

Google Scholar & ScopusArXiv (cs.CR)USENIX Security & IEEE S&P Conference ProceedingsPaste Sites (Pastebin, Ghostbin)GitHub/GitLab (for leaked code or credentials)

Academic channels provide peer-reviewed, high-reliability information on emerging threats, vulnerabilities, and methodologies. Paste sites and code repositories are critical for monitoring accidental data leaks and tracking threat actor toolkits.

Interview Questions

Answer Strategy

Structure your answer using the OSINT Lifecycle. Emphasize defining a precise IR first. For collection, mention surface (LinkedIn, corporate website for org chart), dark (hacking forums for mentions of the company or stolen credentials), and academic (research on insider threat indicators). For triage, explicitly state you would use a grading system like the Admiralty Code, prioritizing sources with direct evidence over hearsay. Sample answer: 'First, I would draft a precise IR statement focusing on data exfiltration indicators. My collection plan would span: surface web for the employee's professional footprint and any public code commits; dark web forums and markets for corporate credential dumps; and academic sources for validated behavioral indicators. I would triage every source using the Admiralty Code, assigning a grade (e.g., A2 for 'usually reliable' source with 'usually truthful' data). A dark web post selling company data would be graded lower than a direct paste site leak until verified against internal assets. This ensures the analysis is built on the most reliable evidence.'

Answer Strategy

This tests your critical thinking and source evaluation process. Focus on methodology over intuition. Describe a specific scenario (e.g., a data breach claim) and the channels involved. Explain your step-by-step verification process. Sample answer: 'I encountered conflicting breach claims on a dark web forum and a surface paste site. I applied a multi-point verification framework: 1. Provenance: The forum poster was anonymous (low reliability), while the paste site data was a direct SQL dump. 2. Corroboration: I cross-referenced the dump's email addresses with haveibeenpwned.com's API. 3. Context: I checked if the purported victim company had any recent vulnerability disclosures in academic or security blogs. The paste site dump passed two checks, the forum post none. I trusted the paste site data but flagged both with their respective reliability grades for the final report.'