Skill Guide

Clinical data modeling with FHIR, HL7, and OMOP Common Data Model

The discipline of structuring and harmonizing clinical, genomic, and operational healthcare data using interoperability standards (HL7, FHIR) and a research-oriented common data model (OMOP CDM) to enable exchange, analysis, and secondary use.

This skill directly enables healthcare organizations to break down data silos, meet regulatory interoperability mandates (e.g., CMS Interoperability and Patient Access Rule), and accelerate clinical research by transforming disparate, messy data into a unified, analysis-ready asset. Mastery reduces data integration costs by 30-50% and cuts time-to-insight for real-world evidence generation from months to weeks.

1 Careers

1 Categories

9.2 Avg Demand

20% Avg AI Risk

How to Learn Clinical data modeling with FHIR, HL7, and OMOP Common Data Model

1. Master core HL7 v2 message structure (segments like MSH, PID, OBX) and FHIR RESTful API concepts (Resources, Endpoints). 2. Understand OMOP CDM v5.4 table schema (PERSON, CONDITION_OCCURRENCE, DRUG_EXPOSURE) and its standardized vocabularies (SNOMED CT, RxNorm, LOINC). 3. Build foundational SQL skills, specifically targeting joins and aggregations across OMOP tables.

1. Implement end-to-end ETL (Extract, Transform, Load) pipelines: Parse HL7v2 messages or CDA documents, map to FHIR Resources (e.g., Patient, Condition), then transform to OMOP CDM tables. 2. Use tools like HAPI FHIR Server for validation and bulk data operations. 3. Avoid common pitfalls: Misaligning source codes to OMOP vocabularies, ignoring temporal relationships in clinical events, or mishandling protected health information (PHI) during transformation.

1. Architect federated or multi-site data networks using OMOP + FHIR (e.g., OHDSI's OMOP-to-FHIR mapping). 2. Design and govern enterprise-level semantic layers that maintain provenance across FHIR and OMOP representations. 3. Lead data quality governance programs, establishing metrics for completeness and plausibility, and mentor teams on US Core/US CDI FHIR implementation guides.

Practice Projects

Beginner

Project

HL7v2 ADT Feed to OMOP PERSON Table Mapper

Scenario

You have a stream of HL7v2 Admit-Discharge-Transfer (ADT) messages. Your task is to extract patient demographics and create a script to populate the OMOP PERSON table.

How to Execute

1. Set up a local HL7v2 message parser (e.g., Python HAPI, or HAPI FHIR CLI). 2. Define mapping rules from PID segment fields (PID-5 Name, PID-7 DOB, PID-8 Sex) to OMOP PERSON columns (person_source_value, birth_datetime, gender_concept_id). 3. Use a pre-loaded OMOP vocabulary table to resolve gender_source_value to gender_concept_id. 4. Validate output by running standard OMOP data quality checks (e.g., via Achilles).

Intermediate

Project

Build a FHIR-to-OMOP Condition ETL for Chronic Disease Research

Scenario

A hospital's EHR exposes FHIR R4 Condition resources via an API. You need to build a scalable pipeline to transform this data into the OMOP CONDITION_OCCURRENCE table for a diabetes cohort study.

How to Execute

1. Write a FHIR bulk data extractor using the $export operation. 2. Develop a transformation layer in Python (pandas, PySpark) or Java to map FHIR Condition.code (using SNOMED) to condition_concept_id in OMOP. 3. Handle onset datetime from Condition.onsetDateTime. 4. Load into OMOP and run cohort characterization queries using OHDSI's WebAPI or ATLAS to validate the diabetes population count.

Advanced

Project

Design a Multi-Institutional FHIR-OMOP Hybrid Data Network Architecture

Scenario

Three health systems need to share data for a federated COVID-19 outcomes study. Each uses different EHRs (Epic, Cerner, custom), but all can expose FHIR APIs. You must design the architecture to support both real-time FHIR queries and batch OMOP analytics.

How to Execute

1. Architect a central OMOP CDM instance as the analytics backbone, with site-specific ETL agents. 2. Implement a FHIR façade service (e.g., using IBM FHIR Server or Smile CDR) over the OMOP CDM to enable real-time FHIR queries against standardized data. 3. Design a data quality and provenance layer to track transformations from source FHIR resources to OMOP. 4. Deploy a federated query engine (e.g., OHDSI's Data Quality Dashboard + cohort diagnostics) to execute distributed analyses without moving patient-level data.

Tools & Frameworks

Software & Platforms

HAPI FHIR (JPA Server & CLI)OHDSI OMOP CDM & Tools (ATLAS, Achilles, Usagi)SQL (PostgreSQL/SQL Server)Apache Spark / Python (pandas, fhir.resources)

HAPI FHIR is the industry standard for FHIR server implementation and testing. OHDSI tools are the backbone for OMOP vocabulary management, ETL, and cohort analysis. SQL is non-negotiable for OMOP data manipulation. Spark/Python are used for large-scale ETL pipelines.

Standards & Implementation Guides

HL7 FHIR R4/R5US Core / US CDI FHIR Implementation GuidesOMOP CDM v5.4 SpecificationOHDSI ETL Conventions

FHIR R4 is the current baseline; US Core defines must-support profiles for US interoperability. The OMOP CDM spec is the target model; ETL conventions ensure consistent, quality data loading.

Vocabulary & Terminology Tools

OHDSI Vocabulary Tables (CONCEPT, CONCEPT_RELATIONSHIP)ATHENA Vocabulary DownloadNLM Value Set Authority Center (VSAC)

ATHENA provides the standardized vocabularies (SNOMED, RxNorm, LOINC) required for OMOP mapping. VSAC is critical for managing FHIR value sets used in profiles like US Core.

Interview Questions

Answer Strategy

Structure your answer using the ETL framework: Extract (FHIR endpoint), Transform (code mapping, datetime logic), Load (OMOP table insertion). Emphasize vocabulary mapping strategy (using OMOP concept_relationship tables for SNOMED to OMOP standard concepts). For free-text, describe a multi-step process: 1) Use Epic's built-in NLP (if available) to extract codes, 2) Map to SNOMED via tools like NLM's MetaMap, 3) Flag unresolved text for manual review. Mention data quality checks (completeness of concept_id fields).

Answer Strategy

The interviewer is testing your end-to-end problem-solving and understanding of data lineage in clinical modeling. Your answer must demonstrate a methodical, data-driven approach. Sample response: 'First, I would use OHDSI's Achilles to run data quality checks, focusing on the CONDITION_OCCURRENCE table for hypertension concept IDs (43021402 SNOMED). I'd drill into source-to-concept mapping logs in Usagi to verify no codes were dropped. Next, I'd compare the ETL logic for condition start dates against the research protocol's index date definition. Finally, I would pull a sample of 10 discordant patient records and trace their journey from raw HL7/FHIR source through each ETL step to identify the transformation rule failure.'