Learning & Adaptation

Traditional robots execute pre-programmed behaviors. Physical AI systems learn from experience. This chapter explores how robots acquire new skills, adapt to changing conditions, and improve their performance over time.

Why Learning Matters

Programming a robot for every possible situation is impractical. The real world presents endless variations:

Every object has unique properties
Every environment has different characteristics
Every task may require subtle adjustments
Conditions change over time

Learning enables robots to:

Acquire complex skills that are hard to program explicitly
Adapt to novel situations not anticipated by designers
Improve continuously through experience
Transfer knowledge between related tasks

Reinforcement Learning

Reinforcement Learning (RL) is a powerful paradigm where robots learn through trial and error.

The RL Framework

The core elements of reinforcement learning:

Agent: The robot (or its control policy) Environment: The world the robot operates in State: Current situation (joint positions, object locations, etc.) Action: What the robot does (motor commands) Reward: A signal indicating success or failure

The agent takes actions, observes resulting states and rewards, and updates its behavior to maximize cumulative reward.

Reward Design

Designing good reward functions is crucial and challenging:

Sparse Rewards: Only signal success/failure at task completion. Clear but provides little guidance during learning.

Dense Rewards: Provide continuous feedback throughout the task. Speeds learning but can lead to unintended behaviors if not carefully designed.

Shaped Rewards: Add intermediate rewards to guide learning toward the goal. Requires domain knowledge to design effectively.

Example: For a grasping task

Sparse: +1 when object is lifted, 0 otherwise
Dense: Reward decreases with distance from object, increases with grasp quality
Shaped: Bonus for approaching, touching, and closing fingers appropriately

Policy Representations

Policies map states to actions. Common representations:

Neural Networks: Deep learning models that can represent complex, nonlinear policies. Most flexible but require significant data.

Movement Primitives: Parameterized motion templates. Simpler to learn but less flexible.

Hybrid Approaches: Combine learned components with structured controllers.

Key RL Algorithms

Several algorithms have proven effective for robot learning:

PPO (Proximal Policy Optimization): Stable learning through constrained policy updates. Widely used for continuous control.

SAC (Soft Actor-Critic): Encourages exploration while learning. Efficient use of collected experience.

TD3 (Twin Delayed DDPG): Addresses overestimation issues in continuous action spaces. Robust performance.

These algorithms balance exploration (trying new things) with exploitation (using what works).

Simulation to Reality Transfer

Training robots in the real world is slow, expensive, and potentially dangerous. Simulation offers an alternative.

The Simulation Approach

Build a physics simulator that models the robot and environment
Train the learning algorithm in simulation (millions of episodes)
Transfer the learned policy to the real robot

Simulation enables:

Parallel training across many instances
Safe exploration of dangerous behaviors
Rapid iteration on learning algorithms
Access to perfect state information for training

The Reality Gap

Simulations never perfectly match reality. Differences arise from:

Physics Modeling: Contact dynamics, friction, and deformation are approximated Sensor Modeling: Real sensors have noise, latency, and artifacts Environment Modeling: Real-world materials and geometries vary

Policies that work in simulation may fail on real robots due to these gaps.

Domain Randomization

One solution randomizes simulation parameters during training:

Vary friction coefficients
Add sensor noise
Change object properties
Randomize visual appearance

The trained policy becomes robust to variations, including the differences between simulation and reality.

Domain Adaptation

Another approach explicitly adapts policies to the real world:

Fine-tune policies with limited real-world data
Learn mappings between simulated and real observations
Use real-world data to improve simulator accuracy

Imitation Learning

Instead of learning from scratch through trial and error, robots can learn from demonstrations.

Learning from Demonstration

Humans show the robot what to do:

Teleoperation: A human directly controls the robot to perform the task. Records actions paired with observations.

Kinesthetic Teaching: A human physically moves the robot's limbs through the desired motion. Intuitive but requires backdrivable hardware.

Observation: The robot watches a human perform the task. Must infer actions from visual observation.

Behavioral Cloning

The simplest imitation approach trains a policy to directly mimic demonstrated actions:

Collect demonstrations (observation-action pairs)
Train a policy to predict actions given observations
Deploy the learned policy

Limitations:

The policy only sees states from demonstrations; new states may cause failures
Small errors compound over time, leading to divergence

Inverse Reinforcement Learning

Instead of copying actions, learn the underlying reward function:

Observe expert demonstrations
Infer what reward function the expert is optimizing
Use standard RL to optimize that reward

This approach generalizes better to new situations but is more complex to implement.

Hybrid Approaches

Modern methods often combine imitation and reinforcement learning:

Use demonstrations to initialize or guide RL
Learn from both expert examples and trial-and-error
Use RL to improve beyond demonstrated performance

Skill Learning and Composition

Complex tasks require combining multiple skills.

Skill Libraries

Robots can learn libraries of reusable skills:

Pick: Grasp an object at a specified location
Place: Put a held object at a target position
Navigate: Move to a destination
Open: Manipulate doors, drawers, containers

Each skill is parameterized (where to pick, where to place) and can be invoked as needed.

Hierarchical Learning

Complex behaviors emerge from hierarchical organization:

High Level: Task planning—what skills to use and in what order Mid Level: Skill execution—parameterized motor programs Low Level: Motor control—joint torques and positions

Learning can occur at each level, with higher levels providing goals to lower levels.

Task and Motion Planning

Combining discrete task planning with continuous motion planning:

Task planner sequences skills to achieve a goal
Motion planner generates feasible trajectories for each skill
Execution monitors progress and triggers replanning if needed

Learning improves both planning and execution over time.

Continual Learning

Robots must learn throughout their operational lifetime.

Avoiding Forgetting

A key challenge is catastrophic forgetting—learning new tasks can disrupt previously acquired skills. Solutions include:

Regularization: Constrain learning to preserve important parameters Replay: Periodically practice old tasks while learning new ones Modular Architectures: Separate network modules for different skills

Adaptation to Change

Environments and tasks evolve over time:

Objects are rearranged
New objects are introduced
User preferences change
Robot hardware ages

Continual learning systems detect changes and update their behaviors appropriately.

Multi-Task Learning

Learning multiple tasks simultaneously can improve efficiency:

Shared representations reduce total learning time
Skills transfer between related tasks
Negative transfer must be managed when tasks conflict

Practical Considerations

Data Efficiency

Real-world robot data is expensive. Techniques to maximize learning from limited data:

Leverage simulation for pre-training
Use data augmentation
Apply transfer learning from related tasks
Design sample-efficient algorithms

Safety During Learning

Exploration can be dangerous. Safety constraints include:

Physical limits on velocities and forces
Conservative initial policies
Human oversight during learning
Gradual expansion of explored regions

Evaluation and Validation

Assessing learned behaviors requires:

Testing on held-out scenarios
Measuring robustness to perturbations
Validating safety properties
Long-term reliability monitoring

Next: In Chapter 6, we'll explore real-world applications and the future of Physical AI—where these technologies are being deployed today and where they're heading.

Why Learning Matters​

Reinforcement Learning​

The RL Framework​

Reward Design​

Policy Representations​

Key RL Algorithms​

Simulation to Reality Transfer​

The Simulation Approach​

The Reality Gap​

Domain Randomization​

Domain Adaptation​

Imitation Learning​

Learning from Demonstration​

Behavioral Cloning​

Inverse Reinforcement Learning​

Hybrid Approaches​

Skill Learning and Composition​

Skill Libraries​

Hierarchical Learning​

Task and Motion Planning​

Continual Learning​

Avoiding Forgetting​

Adaptation to Change​

Multi-Task Learning​

Practical Considerations​

Data Efficiency​

Safety During Learning​

Evaluation and Validation​