AI Genomics Data Analyst
An AI Genomics Data Analyst leverages machine learning, large language models, and bioinformatics pipelines to extract clinically …
Skill Guide
The technical capability to transform high-dimensional genomic variant and expression data into interpretable visual formats-specifically using circular, linear, and genome-wide scatter plots-to identify structural variants, sequence alignments, and disease-associated loci.
Scenario
You have downloaded summary statistics from a published GWAS on Type 2 Diabetes (e.g., from the GWAS Catalog). Your task is to visualize the genome-wide association results.
Scenario
Your CNV pipeline has flagged a 500kb deletion on chromosome 7 in a tumor sample. You need to visually validate it and check for potential artifacts.
Scenario
You are preparing a figure for a paper describing a novel fusion gene in sarcoma. The plot must integrate: 1) copy number alterations (outer ring), 2) somatic SNV density (middle ring), 3) gene fusion links (inner chords), and 4) differential expression of key genes (innermost histogram).
Primary tools for interactive, exploratory visualization of aligned reads, variants, and annotations. IGV is the industry standard for local BAM/VCF inspection; web browsers are for public data context.
Used for scripted, reproducible generation of static and interactive plots (Manhattan, Circos). Essential for pipeline integration and publication-ready figures.
Critical upstream tools to clean, filter, and transform data into the correct input format for visualization tools. You cannot visualize data you cannot parse.
Answer Strategy
The question tests analytical rigor and understanding of genomic confounders. Strategy: Address the inflation first (population stratification, technical artifact), then the locus interpretation. Sample Answer: 'First, I'd check the QQ plot's genomic inflation factor lambda. If λ > 1.05, I'd suspect population stratification and verify the principal components were included in the model. I'd then zoom into the chr6 locus in IGV, overlaying the LD structure and any enhancer/gene annotation tracks (e.g., from ENCODE) to see if the SNP is in a regulatory element affecting a distant gene.'
Answer Strategy
Tests scientific communication and visualization best practices. Focus on clarity, annotation, and audience. Sample Answer: 'I'd recommend a three-pronged approach: 1) Simplify-reduce the number of links to only those above a stringent FDR threshold. 2) Annotate-label key links with gene names or variant IDs directly on the plot. 3) Contextualize-add a brief legend explaining what the links represent (e.g., somatic fusions, eQTLs) and use color to distinguish link types, possibly separating into two focused plots if the story is complex.'
1 career found
Try a different search term.