Learning Roadmap
How to Become a AI Spatial Computing Engineer
A step-by-step, phase-based learning path from beginner to job-ready AI Spatial Computing Engineer. Estimated completion: 11 months across 6 phases.
Progress saved in your browser — no account needed.
-
3D Mathematics & Spatial Foundations
6 weeksGoals
- Master linear algebra, quaternions, transformation matrices, and projective geometry
- Understand coordinate systems, spatial anchoring, and camera models
- Build comfort with 3D data structures - point clouds, meshes, voxel grids
Resources
- 3Blue1Brown 'Essence of Linear Algebra' series
- Steven LaValle 'Virtual Reality' (free online chapters on 3D math)
- Scratchapixel.com - ray tracing and geometry tutorials
- Hands-on: build a basic 3D scene in Unity with scripted transforms
MilestoneYou can manipulate 3D objects programmatically, understand camera projection, and reason about spatial coordinate frames confidently.
-
Computer Vision & Scene Understanding
8 weeksGoals
- Implement depth estimation, semantic segmentation, and object detection pipelines
- Understand SLAM fundamentals and visual-inertial odometry
- Learn to work with Hugging Face vision models and fine-tune on custom spatial data
Resources
- CS231n (Stanford) - Convolutional Neural Networks for Visual Recognition
- Hugging Face 'Vision' documentation and model hub exploration
- ORB-SLAM3 / RTAB-Map open-source SLAM implementations
- Build: a real-time depth estimation pipeline using MiDaS or Depth Anything on webcam input
MilestoneYou can take raw camera input and extract meaningful spatial understanding - depth maps, detected objects, and semantic labels - in real time.
-
Neural 3D Representations & Generative Spatial AI
8 weeksGoals
- Understand NeRF, 3D Gaussian Splatting, and neural implicit surface representations
- Build pipelines for 3D reconstruction from images/video
- Explore generative 3D models - text-to-3D, image-to-3D, scene completion
Resources
- Nerfstudio documentation and tutorials
- 3D Gaussian Splatting paper + gsplat / nerfstudio implementations
- OpenAI Point-E / Shap-E, Meta 3D Gen research
- Build: reconstruct a real room from phone-captured video using Gaussian Splatting
MilestoneYou can capture, reconstruct, and intelligently manipulate 3D scenes using neural representations, and evaluate generative 3D model quality.
-
Spatial AI Agents & Multi-Modal Integration
8 weeksGoals
- Architect spatial RAG systems that ground LLMs in physical environment data
- Integrate vision-language models (GPT-4o, LLaVA) for scene-aware conversations
- Build AI agents that can reason about and interact with spatial environments
Resources
- LangChain / LangGraph documentation for multi-tool agent design
- OpenAI Vision API and function-calling best practices
- Research papers on embodied AI and visual grounding
- Build: an AR agent that can answer questions about objects in a room using VLM + spatial anchors
MilestoneYou can build intelligent spatial agents that perceive, reason about, and respond to queries about 3D environments using modern AI toolchains.
-
Production Spatial Applications & Edge Deployment
8 weeksGoals
- Deploy AI models to AR/VR headsets with optimized inference (Core ML, TensorRT, ONNX)
- Design cloud-edge architectures for spatial AI with latency-aware pipelines
- Ship a polished spatial AI demo on a real headset (Quest, Vision Pro, or HoloLens)
Resources
- Apple visionOS developer documentation and WWDC sessions
- Meta Quest developer hub and Presence Platform guides
- NVIDIA TensorRT and ONNX Runtime optimization tutorials
- Build: ship a full spatial AI application to a headset with < 20ms inference latency
MilestoneYou can deliver production-quality spatial AI experiences on commercial hardware, with optimized models, robust spatial anchoring, and polished AI-driven interactions.
-
Advanced Specialization & Portfolio Polish
6 weeksGoals
- Deep-dive into one specialization: generative 3D, embodied AI, surgical AR, or industrial spatial computing
- Contribute to open-source spatial AI projects or publish technical writing
- Build a portfolio of 3-5 polished spatial AI projects with documentation
Resources
- Conference talks from AWE, CVPR 3D workshops, SIGGRAPH Emerging Technologies
- Open-source repos: Nerfstudio, gsplat, LangChain spatial RAG templates
- Technical blog writing on Medium / personal site for visibility
- Build: a capstone project combining generative 3D, spatial RAG, and headset deployment
MilestoneYou have a compelling portfolio, specialization depth, and the credibility to interview for AI Spatial Computing Engineer roles at top-tier companies.
Practice Projects
Apply your skills with hands-on projects. Ordered by difficulty.
AI-Powered Room Scanner & Semantic Mapper
BeginnerBuild a mobile app that scans a room using phone camera, generates a 3D point cloud, and overlays semantic labels (furniture, walls, objects) using a pre-trained segmentation model. The output is a spatially indexed scene graph stored in a local database.
NeRF Room Reconstruction from Phone Video
IntermediateCapture a short video walkthrough of a room with your phone, process it through COLMAP for camera poses, and train a NeRF or Gaussian Splatting model to create a photorealistic 3D reconstruction viewable in a web-based 3D viewer.
Spatial RAG Agent for Smart Home Control
IntermediateBuild a LangChain-based agent that can answer natural language questions about a smart home environment by combining spatial scene data (from a 3D map), IoT sensor data, and an LLM. Users can ask 'What's the temperature near the window?' and get grounded answers.
Real-Time Hand Gesture AI for AR Menu Navigation
IntermediateImplement a hand-tracking pipeline using MediaPipe or custom models that recognizes custom gestures and maps them to spatial UI interactions - selecting, scrolling, and dismissing floating AR menus with sub-100ms response time.
Generative 3D Asset Pipeline for Spatial Design
AdvancedBuild a pipeline where users describe a 3D object in natural language (e.g., 'a mid-century modern chair'), generate it using a text-to-3D model, evaluate quality with automated metrics, and place it in a scanned room environment with physics-aware placement.
Multi-User Shared AR Experience with AI Moderator
AdvancedCreate a shared AR experience for 2-4 users where an AI agent acts as a spatial moderator - managing shared virtual objects, detecting conflicts in user actions, providing contextual guidance, and maintaining a consistent shared spatial state across devices.
Industrial AR Maintenance Assistant with Vision AI
AdvancedBuild an AR application for industrial maintenance that uses object recognition to identify machine components, overlays AI-generated step-by-step repair instructions, tracks task completion using hand tracking, and logs spatial data for quality assurance.
Ready to Start Your Journey?
Prep for interviews alongside your learning — it reinforces every concept.