AI Precision Medicine Specialist
An AI Precision Medicine Specialist designs and deploys machine learning systems that analyze genomic, proteomic, clinical, and li…
Skill Guide
The computational process of identifying differences (variants) between a sample's DNA/RNA sequence and a reference genome, then assigning biological and clinical significance to those variants using curated databases and predictive algorithms.
Scenario
You have access to a whole-genome sequencing (WGS) BAM file for a single human sample (e.g., from the GIAB consortium). Your goal is to produce a high-confidence set of germline SNVs and indels.
Scenario
You are analyzing a matched tumor-normal pair from a cancer patient (WES data). The objective is to identify somatic mutations (SNVs, indels) specific to the tumor, while filtering out germline variants and sequencing artifacts.
Scenario
You are leading a project to analyze a cohort of 50 cancer patients with matched WGS (for somatic/structural variants), RNA-seq (for expression and fusion detection), and clinical data. The goal is to identify subtype-specific molecular profiles and potential therapeutic targets.
GATK is the industry standard for germline and somatic variant calling (HaplotypeCaller, Mutect2). Sentieon is a high-performance, licensed alternative. DeepVariant uses deep learning for highly accurate SNP/indel calling.
Used to build portable, reproducible, and scalable pipelines. Nextflow (with DSL2) and Snakemake are popular in research; WDL is the language behind Terra (Broad Institute's platform).
VEP and ANNOVAR add functional context to variants (gene impact, protein change). ClinVar provides clinical significance (pathogenic, benign). gnomAD is essential for population frequency filtering.
IGV is for manual inspection of alignments at variant sites. Qualimap and MultiQC aggregate and report quality metrics across samples/pipelines to flag systematic issues.
Answer Strategy
Structure the answer sequentially: (1) Alignment (BWA-MEM) to produce BAM, (2) Mark Duplicates (Picard), (3) Base Quality Score Recalibration (BQSR) - which uses known sites to adjust quality scores for systematic technical errors, (4) HaplotypeCaller in GVCF mode, (5) Joint genotyping (GenotypeGVFs), (6) Variant Quality Score Recalibration (VQSR) for filtering. Emphasize BQSR's role in correcting for covariates like machine cycle and sequence context.
Answer Strategy
The interviewer is testing troubleshooting methodology and understanding of artifact sources. A strong answer outlines: 1) Verify sample identity (check fingerprinting). 2) Inspect a subset in IGV for strand bias, mapping quality, or alignment artifacts. 3) Check the Panel of Normals (PoN) for recurrence. 4) Review base quality scores and sequencing depth in the tumor BAM. 5) Consider if the tumor has high tumor purity or subclonality issues. 6) Run additional callers (e.g., Strelka, VarDict) for concordance.
1 career found
Try a different search term.