Learning & Adaptation
Traditional robots execute pre-programmed behaviors. Physical AI systems learn from experience. This chapter explores how robots acquire new skills, adapt to changing conditions, and improve their performance over time.
Why Learning Matters
Programming a robot for every possible situation is impractical. The real world presents endless variations:
- Every object has unique properties
- Every environment has different characteristics
- Every task may require subtle adjustments
- Conditions change over time
Learning enables robots to:
- Acquire complex skills that are hard to program explicitly
- Adapt to novel situations not anticipated by designers
- Improve continuously through experience
- Transfer knowledge between related tasks
Reinforcement Learning
Reinforcement Learning (RL) is a powerful paradigm where robots learn through trial and error.
The RL Framework
The core elements of reinforcement learning:
Agent: The robot (or its control policy) Environment: The world the robot operates in State: Current situation (joint positions, object locations, etc.) Action: What the robot does (motor commands) Reward: A signal indicating success or failure
The agent takes actions, observes resulting states and rewards, and updates its behavior to maximize cumulative reward.
Reward Design
Designing good reward functions is crucial and challenging:
Sparse Rewards: Only signal success/failure at task completion. Clear but provides little guidance during learning.
Dense Rewards: Provide continuous feedback throughout the task. Speeds learning but can lead to unintended behaviors if not carefully designed.
Shaped Rewards: Add intermediate rewards to guide learning toward the goal. Requires domain knowledge to design effectively.
Example: For a grasping task
- Sparse: +1 when object is lifted, 0 otherwise
- Dense: Reward decreases with distance from object, increases with grasp quality
- Shaped: Bonus for approaching, touching, and closing fingers appropriately
Policy Representations
Policies map states to actions. Common representations:
Neural Networks: Deep learning models that can represent complex, nonlinear policies. Most flexible but require significant data.
Movement Primitives: Parameterized motion templates. Simpler to learn but less flexible.
Hybrid Approaches: Combine learned components with structured controllers.
Key RL Algorithms
Several algorithms have proven effective for robot learning:
PPO (Proximal Policy Optimization): Stable learning through constrained policy updates. Widely used for continuous control.
SAC (Soft Actor-Critic): Encourages exploration while learning. Efficient use of collected experience.
TD3 (Twin Delayed DDPG): Addresses overestimation issues in continuous action spaces. Robust performance.
These algorithms balance exploration (trying new things) with exploitation (using what works).
Simulation to Reality Transfer
Training robots in the real world is slow, expensive, and potentially dangerous. Simulation offers an alternative.
The Simulation Approach
- Build a physics simulator that models the robot and environment
- Train the learning algorithm in simulation (millions of episodes)
- Transfer the learned policy to the real robot
Simulation enables:
- Parallel training across many instances
- Safe exploration of dangerous behaviors
- Rapid iteration on learning algorithms
- Access to perfect state information for training
The Reality Gap
Simulations never perfectly match reality. Differences arise from:
Physics Modeling: Contact dynamics, friction, and deformation are approximated Sensor Modeling: Real sensors have noise, latency, and artifacts Environment Modeling: Real-world materials and geometries vary
Policies that work in simulation may fail on real robots due to these gaps.
Domain Randomization
One solution randomizes simulation parameters during training:
- Vary friction coefficients
- Add sensor noise
- Change object properties
- Randomize visual appearance
The trained policy becomes robust to variations, including the differences between simulation and reality.
Domain Adaptation
Another approach explicitly adapts policies to the real world:
- Fine-tune policies with limited real-world data
- Learn mappings between simulated and real observations
- Use real-world data to improve simulator accuracy
Imitation Learning
Instead of learning from scratch through trial and error, robots can learn from demonstrations.
Learning from Demonstration
Humans show the robot what to do:
Teleoperation: A human directly controls the robot to perform the task. Records actions paired with observations.
Kinesthetic Teaching: A human physically moves the robot's limbs through the desired motion. Intuitive but requires backdrivable hardware.
Observation: The robot watches a human perform the task. Must infer actions from visual observation.
Behavioral Cloning
The simplest imitation approach trains a policy to directly mimic demonstrated actions:
- Collect demonstrations (observation-action pairs)
- Train a policy to predict actions given observations
- Deploy the learned policy
Limitations:
- The policy only sees states from demonstrations; new states may cause failures
- Small errors compound over time, leading to divergence
Inverse Reinforcement Learning
Instead of copying actions, learn the underlying reward function:
- Observe expert demonstrations
- Infer what reward function the expert is optimizing
- Use standard RL to optimize that reward
This approach generalizes better to new situations but is more complex to implement.
Hybrid Approaches
Modern methods often combine imitation and reinforcement learning:
- Use demonstrations to initialize or guide RL
- Learn from both expert examples and trial-and-error
- Use RL to improve beyond demonstrated performance
Skill Learning and Composition
Complex tasks require combining multiple skills.
Skill Libraries
Robots can learn libraries of reusable skills:
- Pick: Grasp an object at a specified location
- Place: Put a held object at a target position
- Navigate: Move to a destination
- Open: Manipulate doors, drawers, containers
Each skill is parameterized (where to pick, where to place) and can be invoked as needed.
Hierarchical Learning
Complex behaviors emerge from hierarchical organization:
High Level: Task planning—what skills to use and in what order Mid Level: Skill execution—parameterized motor programs Low Level: Motor control—joint torques and positions
Learning can occur at each level, with higher levels providing goals to lower levels.
Task and Motion Planning
Combining discrete task planning with continuous motion planning:
- Task planner sequences skills to achieve a goal
- Motion planner generates feasible trajectories for each skill
- Execution monitors progress and triggers replanning if needed
Learning improves both planning and execution over time.
Continual Learning
Robots must learn throughout their operational lifetime.
Avoiding Forgetting
A key challenge is catastrophic forgetting—learning new tasks can disrupt previously acquired skills. Solutions include:
Regularization: Constrain learning to preserve important parameters Replay: Periodically practice old tasks while learning new ones Modular Architectures: Separate network modules for different skills
Adaptation to Change
Environments and tasks evolve over time:
- Objects are rearranged
- New objects are introduced
- User preferences change
- Robot hardware ages
Continual learning systems detect changes and update their behaviors appropriately.
Multi-Task Learning
Learning multiple tasks simultaneously can improve efficiency:
- Shared representations reduce total learning time
- Skills transfer between related tasks
- Negative transfer must be managed when tasks conflict
Practical Considerations
Data Efficiency
Real-world robot data is expensive. Techniques to maximize learning from limited data:
- Leverage simulation for pre-training
- Use data augmentation
- Apply transfer learning from related tasks
- Design sample-efficient algorithms
Safety During Learning
Exploration can be dangerous. Safety constraints include:
- Physical limits on velocities and forces
- Conservative initial policies
- Human oversight during learning
- Gradual expansion of explored regions
Evaluation and Validation
Assessing learned behaviors requires:
- Testing on held-out scenarios
- Measuring robustness to perturbations
- Validating safety properties
- Long-term reliability monitoring
Next: In Chapter 6, we'll explore real-world applications and the future of Physical AI—where these technologies are being deployed today and where they're heading.