Interview Prep
AI Computer Vision Engineer Interview Questions
50 expert questions covering beginner fundamentals to advanced AI workflow scenarios. Each answer includes a hint for structured responses.
Beginner
5 questionsA strong answer defines each task clearly, describes the output format (label vs. bounding boxes vs. pixel-wise masks), and gives a practical example for each.
Cover local receptive fields, parameter sharing, translation equivariance, and hierarchical feature learning from edges to textures to objects.
Explain pretraining on large datasets like ImageNet, fine-tuning on domain-specific data, reduced data requirements, and faster convergence.
Discuss overfitting prevention and domain robustness, then list augmentations like random flip, rotation, color jitter, CutOut, and mosaic.
Explain the precision-recall curve, IoU thresholding, AP per class, and averaging across classes; contrast with simple accuracy which fails for detection.
Intermediate
10 questionsDiscuss single-stage vs. transformer-based detection, inference speed, small-object performance, training data requirements, and deployment constraints.
Cover techniques like oversampling, focal loss, class-weighted loss, synthetic data generation, and augmentation targeted at minority classes.
Describe IoU calculation, its limitations for non-overlapping boxes, and how generalized variants add penalty terms for center distance and aspect ratio.
Explain NMS filtering of overlapping boxes by confidence score, its issues with dense or occluded objects, and alternatives like Soft-NMS or learned NMS.
Cover tool selection, annotation guidelines, quality control (inter-annotator agreement, review cycles), active learning for prioritization, and versioning.
Discuss INT8 vs FP32, post-training quantization vs. quantization-aware training, accuracy trade-offs, and 2-4x latency improvements on supported hardware.
Cover predefined box priors and their role in detection, then explain anchor-free approaches like CenterNet or FCOS that predict key points or center-ness.
Discuss GradCAM visualizations, occlusion sensitivity, SHAP for images, testing on out-of-distribution data, and ablation studies on background regions.
Semantic assigns class labels to every pixel without distinguishing instances; instance separates individual objects; panoptic unifies both in a single framework.
Cover model export to ONNX, TensorRT for edge, SageMaker or container-based cloud serving, shared preprocessing, and a CI/CD pipeline for both targets.
Advanced
10 questionsDiscuss patch embedding, positional encoding, self-attention across patches, lack of spatial inductive bias, data hunger, and hybrid architectures.
Cover the image encoder (ViT), prompt encoder for points/boxes/text, mask decoder, the SA-1B dataset, and composability with other models.
Cover detection-per-frame, feature extraction (ReID), association algorithms (Hungarian, ByteTrack), Kalman filter for prediction, handling occlusions and ID switches.
Discuss contrastive learning (SimCLR, DINO), pseudo-labeling, teacher-student frameworks, consistency regularization, and curriculum strategies.
Cover soft targets, temperature scaling, feature-level distillation, task-specific loss weighting, and empirical accuracy-efficiency trade-off analysis.
Explain volumetric rendering with MLP, view synthesis from sparse images, applications in AR/VR, digital twins, robotics simulation, and training data synthesis.
Explain contrastive pretraining on image-text pairs, shared embedding space, zero-shot classification via text prompts, and retrieval using cosine similarity.
Cover domain randomization, domain adaptation techniques, style transfer for realism, progressive fine-tuning on small real datasets, and covariate shift detection.
Discuss distributed inference, frame sampling strategies, temporal models, hierarchical processing (keyframe detection then detail analysis), and cost-optimization.
Cover FGSM, PGD attacks, adversarial patch attacks, certified defenses (randomized smoothing), input preprocessing defenses, and practical robustness testing.
Scenario-Based
10 questionsDiscuss anomaly detection with normal-only training, synthetic defect generation, few-shot learning, unsupervised methods like autoencoders, and iterative data collection.
Cover data distribution analysis, covariate shift detection, annotation quality audit, environmental factors (lighting, occlusion), threshold tuning, and error categorization.
Discuss FDA/regulatory pathways, bias auditing across demographics, explainability (GradCAM), physician-in-the-loop design, clinical validation trials, and data privacy (HIPAA).
Cover architecture choice (smaller backbone), TensorRT optimization, INT8 quantization, input resolution reduction, layer fusion, and profiling with Nsight or trtexec.
Discuss person detection and tracking, product recognition, hand-object interaction detection, multi-camera fusion, re-identification, and privacy-preserving design.
Cover privacy-by-design, on-device processing, anonymization, consent frameworks, demographic bias in models, regulatory compliance (GDPR), and alternative non-biometric KPIs.
Discuss lightweight model architecture (MobileNet), TFLite/Core ML deployment, offline-first design, uncertainty estimation to flag unfamiliar cases, and progressive model updates.
Cover camera calibration, color normalization, resolution-agnostic architectures, test-time augmentation, domain-specific fine-tuning, and robust preprocessing pipelines.
Discuss fidelity vs. diversity trade-off, mode collapse risks, domain gap between synthetic and real, validation on real holdout data, and using diffusion models or GANs responsibly.
Cover reverse engineering existing pipelines, incremental modernization, wrapping legacy code in Python via bindings, hybrid classical-DL pipelines, and staged rollout with fallbacks.
AI Workflow & Tools
10 questionsDiscuss shared backbone with task-specific heads, multi-loss aggregation, data loaders with joint annotation formats, gradient balancing, and evaluation metrics per task.
Cover experiment logging (hyperparams, metrics, artifacts), sweep configuration, run comparison dashboards, model versioning, and reproducibility via config files.
Discuss image upload, annotation, preprocessing and augmentation pipelines, versioning, export to multiple formats, and integration with training scripts via API.
Cover loading a pretrained ViT from the Hub, configuring the Trainer API, dataset preprocessing with image transforms, evaluation strategy, and pushing the fine-tuned model back to Hub.
Describe unit tests for data loaders and model outputs, training on push to main, model evaluation gates, containerization, and automated deployment to cloud or edge.
Explain text-prompted detection with Grounding DINO, passing detected boxes as prompts to SAM, post-processing masks, and combining results into a unified segmentation output.
Cover NCCL backend, DistributedSampler, model wrapping, gradient synchronization, mixed-precision training, and common pitfalls like data loading bottlenecks.
Discuss SageMaker Training Jobs, Automatic Model Tuning for hyperparameters, model registry, real-time vs. batch transform endpoints, and cost management with spot instances.
Cover the GStreamer-based pipeline architecture, primary and secondary inference engines, tracker integration, custom post-processing plugins, and multi-stream scaling.
Discuss uncertainty sampling, entropy-based selection, diversity sampling, model disagreement (ensembles), and integration with annotation platforms via API.
Behavioral
5 questionsLook for clear communication strategies, use of visual aids or analogies, patience, and evidence that the stakeholder made a better decision as a result.
Strong answers show intellectual humility, systematic debugging, root cause analysis, and concrete process changes implemented to prevent recurrence.
Expect discussion of arXiv, conferences (CVPR, ICCV, ECCV), reading groups, and a concrete example showing they apply research, not just consume it.
Look for strategies around early alignment workshops, documenting metric definitions, building flexible evaluation frameworks, and proactive communication.
Assess ability to advocate for their position with data, listen to alternative viewpoints, run experiments to resolve disagreements, and commit to team decisions.