AI Robotics AI Engineer
An AI Robotics AI Engineer designs and implements the intelligence layer for robotic systems, specializing in integrating cutting-…
Skill Guide
The application of generative AI models-specifically Vision-Language Models (VLMs) and Large Language Models (LLMs)-to parse complex, multi-modal instructions and autonomously generate hierarchical task plans, motion primitives, or executable code for robotic systems.
Scenario
You have a simulated tabletop with various colored blocks and cups. A user gives a natural language command: 'Put all the red blocks into the blue cup.' The robot must use its camera feed and the command to plan and execute the sorting.
Scenario
A mobile manipulator in a simulated kitchen must 'make a cup of coffee.' The task requires sequencing subtasks: find mug, pick mug, navigate to coffee machine, place mug, press brew button, wait, pick full mug, deliver to table. The environment is dynamic; if an object is missing or a step fails, the system must replan.
Scenario
Develop and deploy a monolithic VLA model (like RT-2 or a custom transformer) that takes raw RGB images and a language instruction and directly outputs low-level robot actions (joint velocities or end-effector poses) for a complex task like 'fold the laundry' in a real-world setting.
ROS 2 for robot middleware and communication. PyTorch/JAX for model development. Hugging Face for accessing pre-trained VLMs/LLMs. Isaac Sim/Gazebo for high-fidelity simulation and synthetic data generation. LangChain/LlamaIndex for structuring complex LLM reasoning chains and integrating external tools or memory.
GPT-4/Llama 3 as powerful general-purpose planners. LLaVA/CLIP for zero-shot or few-shot visual grounding. RT-2 and SayCan as seminal architectures for grounding LLMs in robotic affordances. PaLM-E as a multimodal embodied model. CLIPort for language-guided robotic manipulation.
Jetson for edge inference. RealSense/Zed for RGB-D perception. Franka/UR arms for prototyping. ONNX/TensorRT for model optimization and deployment on target hardware.
Answer Strategy
The answer should demonstrate a clear understanding of hierarchical decomposition. Strategy: Start by defining the LLM's role as a task planner that breaks 'tidy up' into object-specific subtasks (e.g., 'put books on shelf', 'take cups to kitchen'). The VLM's role is to ground these concepts in the current visual scene. Address ambiguity by having the LLM generate clarification questions or default assumptions based on common sense. Sample answer: 'I would implement a two-stage system. An LLM planner first decomposes the high-level command into a sequence of object-centric subtasks, using chain-of-thought reasoning to handle ambiguities by defining defaults (e.g., books to shelves, dishes to sink). A VLM then performs open-vocabulary object detection and pose estimation to ground each subtask's target in the real scene. The output is a task graph passed to a motion planner. Ambiguity is resolved via a feedback loop where the system asks for clarification if confidence scores for grounding or planning are low.'
Answer Strategy
Tests knowledge of safety and system robustness. The core competency is failure analysis and defensive design. Sample answer: 'A common failure is an LLM planning a path through an obstacle because it lacks a true physics model. I would debug this by first checking the model's input: was the environment state accurately represented in its context? Mitigation involves a multi-layer safety approach: 1) Constrain the LLM's output space by having it select from a pre-verified skill library rather than generating raw code. 2) Implement a physics-based simulator as a safety filter that validates any generated plan before execution. 3) Use a traditional motion planner with collision checking as the final executor, treating the LLM's output as a set of waypoints or subgoals. This separates creative reasoning from verified execution.'
1 career found
Try a different search term.