AI Enterprise Product Manager
The AI Enterprise Product Manager owns the strategy, roadmap, and execution of AI-powered products that solve complex business pro…
Skill Guide
The systematic application of engineering and financial analysis to minimize the cost-per-inference of production AI models while maintaining performance SLOs.
Scenario
You have a deployed PyTorch model on AWS EC2 (p3.2xlarge instances) serving 10k requests/min. The monthly bill is $15k. Management wants a 30% reduction.
Scenario
Reduce cost for an NLP model (BERT-large) that has high GPU memory usage but moderate compute utilization. Traffic is diurnal, with peak at 1000 QPS and trough at 50 QPS.
Scenario
Serving a model suite (fast & cheap small model + slow & expensive large model) for a search/recommendation system where request criticality varies.
Apply post-training quantization, graph optimization, and kernel fusion to reduce compute and memory footprint, directly lowering instance requirements.
Use for autoscaling inference pods based on custom metrics (QPS, queue length) and for managed deployment with built-in cost optimization features like automatic instance selection.
Instrument GPU utilization, model latency, and correlate with cloud cost data in real-time to identify optimization targets and validate savings.
1 career found
Try a different search term.