AI Genomics Data Analyst
An AI Genomics Data Analyst leverages machine learning, large language models, and bioinformatics pipelines to extract clinically …
Skill Guide
The application of statistical methods to identify genetic variants associated with traits or diseases (GWAS), quantify aggregate genetic risk (polygenic risk scores), and control for false discoveries in high-dimensional genomic data (multiple testing correction).
Scenario
You have been provided with genotype and phenotype data for a simulated complex trait (e.g., height) from a public resource like the 1000 Genomes Project or a synthetic dataset.
Scenario
A biobank has GWAS summary statistics for coronary artery disease (CAD) and you have individual-level genotype and phenotype data for a separate cohort. The goal is to build a CAD PRS and test its association with disease status.
Scenario
Standard PRS trained on European-ancestry GWAS perform poorly in non-European populations. You are tasked with developing a novel statistical method to improve PRS portability.
PLINK is the workhorse for GWAS QC and association testing. GCTA is used for GREML heritability and mixed models. R/Python are essential for data manipulation, custom statistical modeling, and visualization. PRSice-2 is the standard for PRS analysis.
LMMs are the gold standard for confounding control. FDR is the preferred multiple testing correction. LDSC estimates genetic correlation and heritability from summary stats. Bayesian methods are key for modern, high-performance PRS. Meta-analysis is required for combining studies.
These provide the large-scale genotype-phenotype data and reference panels necessary for conducting and benchmarking real-world analyses. Access often requires institutional approval.
Answer Strategy
Test understanding of confounding diagnostics and solutions. Explain that lambda > 1.0 indicates potential confounding from population stratification or relatedness. The candidate should first inspect the Q-Q plot to see if inflation is genome-wide or driven by a few loci. The corrective action is to use a mixed-model approach (e.g., BOLT-LMM) or, if using a standard model, ensure principal components are adequately included as covariates. Mention also checking for batch effects or cryptic relatedness.
Answer Strategy
Tests critical thinking and translational acumen. The core competency is assessing model validity and real-world utility. The candidate should ask: 1) 'What was the validation cohort's ancestry and how does it compare to the training data?' (portability). 2) 'What is the AUC or R² in absolute terms, and how much incremental value does it add over classical clinical risk factors?' (clinical relevance). 3) 'Was the validation performed on a truly independent cohort to avoid overfitting?' (methodological rigor).
1 career found
Try a different search term.