Skip to main content

Skill Guide

Data visualization for genomic results (Circos plots, IGV, Manhattan plots)

The technical capability to transform high-dimensional genomic variant and expression data into interpretable visual formats-specifically using circular, linear, and genome-wide scatter plots-to identify structural variants, sequence alignments, and disease-associated loci.

This skill bridges the gap between raw bioinformatic output and biological insight, enabling faster hypothesis generation in R&D and more persuasive evidence in clinical or translational settings. It directly reduces the time-to-decision in target discovery and variant interpretation, impacting pipeline efficiency and regulatory submission quality.
1 Careers
1 Categories
9.2 Avg Demand
15% Avg AI Risk

How to Learn Data visualization for genomic results (Circos plots, IGV, Manhattan plots)

1. **Core Data Structures**: Master VCF, BED, BAM/SAM file formats. 2. **Basic Plotting**: Learn to generate a Manhattan plot using PLINK/QQman in R. 3. **Conceptual Foundation**: Understand genomic coordinates (hg19/GRCh38) and common plot elements (axes, tracks, histograms).
1. **Integrated Genomics Viewer (IGV)**: Progress from loading single BAM files to creating multi-track sessions with annotations, variant calls, and expression data. 2. **Customization**: Use R/Bioconductor (karyoploteR) or Python (pyCircos) to build a basic Circos plot linking SNPs to genes. 3. **Common Pitfall**: Avoid misinterpreting visual density as biological significance without proper statistical correction (e.g., for Manhattan plot p-value thresholds).
1. **Automation & Pipelines**: Script visualization generation (e.g., using R Markdown or Snakemake) for reproducible reports. 2. **Strategic Storytelling**: Design a multi-omics Circos plot integrating GWAS, expression QTL, and chromatin interaction data to present to a cross-functional team. 3. **Mentorship**: Establish a lab-wide protocol for IGV session management and visualization standards.

Practice Projects

Beginner
Project

Generate a Publication-Quality Manhattan Plot from Public GWAS Data

Scenario

You have downloaded summary statistics from a published GWAS on Type 2 Diabetes (e.g., from the GWAS Catalog). Your task is to visualize the genome-wide association results.

How to Execute
1. Download the summary statistics file (TSV/CSV). 2. Use the `qqman` package in R: `manhattan(gwas_data, chr='CHR', bp='BP', p='P', snp='SNP')`. 3. Add a suggestive and genome-wide significance line using `suggestiveline` and `genomewideline` arguments. 4. Export as a high-resolution TIFF/PDF with labeled axes and title.
Intermediate
Project

Build an IGV Session to Corrode a Candidate Structural Variant

Scenario

Your CNV pipeline has flagged a 500kb deletion on chromosome 7 in a tumor sample. You need to visually validate it and check for potential artifacts.

How to Execute
1. Load the aligned BAM file for the sample and a matched normal in IGV. 2. Navigate to the region (chr7:start-end). 3. Add the CNV track (BED format) from your pipeline. 4. Examine read depth (coverage) across the region-look for a consistent drop. 5. Inspect split reads and discordant read pairs at the breakpoints. 6. Save the session as an XML file with notes for your records.
Advanced
Project

Design a Multi-Layer Circos Plot for a Cancer Genomics Manuscript

Scenario

You are preparing a figure for a paper describing a novel fusion gene in sarcoma. The plot must integrate: 1) copy number alterations (outer ring), 2) somatic SNV density (middle ring), 3) gene fusion links (inner chords), and 4) differential expression of key genes (innermost histogram).

How to Execute
1. Prepare input files: BED for CN, BED for SNV counts, link file for fusions, BED for expression log2FC. 2. Use the `RCircos` or `circlize` package in R. 3. Define the plot layout (cytoband, tracks). 4. Add each track layer by layer, ensuring consistent coloring and scaling. 5. Highlight the fusion link with a distinct color. 6. Export as vector (SVG/PDF) for journal submission, ensuring all labels are legible at print size.

Tools & Frameworks

Software & Platforms

Integrative Genomics Viewer (IGV)UCSC Genome BrowserEnsembl Genome Browser

Primary tools for interactive, exploratory visualization of aligned reads, variants, and annotations. IGV is the industry standard for local BAM/VCF inspection; web browsers are for public data context.

Programming Libraries & Packages

R (ggplot2, karyoploteR, RCircos, circlize, qqman)Python (pyCircos, matplotlib, plotly)

Used for scripted, reproducible generation of static and interactive plots (Manhattan, Circos). Essential for pipeline integration and publication-ready figures.

File Format & Data Preprocessing Tools

BCFtools (VCF manipulation)SAMtools (BAM indexing)BEDTools (BED file operations)PLINK (GWAS summary stats)

Critical upstream tools to clean, filter, and transform data into the correct input format for visualization tools. You cannot visualize data you cannot parse.

Interview Questions

Answer Strategy

The question tests analytical rigor and understanding of genomic confounders. Strategy: Address the inflation first (population stratification, technical artifact), then the locus interpretation. Sample Answer: 'First, I'd check the QQ plot's genomic inflation factor lambda. If λ > 1.05, I'd suspect population stratification and verify the principal components were included in the model. I'd then zoom into the chr6 locus in IGV, overlaying the LD structure and any enhancer/gene annotation tracks (e.g., from ENCODE) to see if the SNP is in a regulatory element affecting a distant gene.'

Answer Strategy

Tests scientific communication and visualization best practices. Focus on clarity, annotation, and audience. Sample Answer: 'I'd recommend a three-pronged approach: 1) Simplify-reduce the number of links to only those above a stringent FDR threshold. 2) Annotate-label key links with gene names or variant IDs directly on the plot. 3) Contextualize-add a brief legend explaining what the links represent (e.g., somatic fusions, eQTLs) and use color to distinguish link types, possibly separating into two focused plots if the story is complex.'

Careers That Require Data visualization for genomic results (Circos plots, IGV, Manhattan plots)

1 career found