Skip to main content

Learning & Adaptation

Traditional robots execute pre-programmed behaviors. Physical AI systems learn from experience. This chapter explores how robots acquire new skills, adapt to changing conditions, and improve their performance over time.

Why Learning Matters

Programming a robot for every possible situation is impractical. The real world presents endless variations:

  • Every object has unique properties
  • Every environment has different characteristics
  • Every task may require subtle adjustments
  • Conditions change over time

Learning enables robots to:

  • Acquire complex skills that are hard to program explicitly
  • Adapt to novel situations not anticipated by designers
  • Improve continuously through experience
  • Transfer knowledge between related tasks

Reinforcement Learning

Reinforcement Learning (RL) is a powerful paradigm where robots learn through trial and error.

The RL Framework

The core elements of reinforcement learning:

Agent: The robot (or its control policy) Environment: The world the robot operates in State: Current situation (joint positions, object locations, etc.) Action: What the robot does (motor commands) Reward: A signal indicating success or failure

The agent takes actions, observes resulting states and rewards, and updates its behavior to maximize cumulative reward.

Reward Design

Designing good reward functions is crucial and challenging:

Sparse Rewards: Only signal success/failure at task completion. Clear but provides little guidance during learning.

Dense Rewards: Provide continuous feedback throughout the task. Speeds learning but can lead to unintended behaviors if not carefully designed.

Shaped Rewards: Add intermediate rewards to guide learning toward the goal. Requires domain knowledge to design effectively.

Example: For a grasping task

  • Sparse: +1 when object is lifted, 0 otherwise
  • Dense: Reward decreases with distance from object, increases with grasp quality
  • Shaped: Bonus for approaching, touching, and closing fingers appropriately

Policy Representations

Policies map states to actions. Common representations:

Neural Networks: Deep learning models that can represent complex, nonlinear policies. Most flexible but require significant data.

Movement Primitives: Parameterized motion templates. Simpler to learn but less flexible.

Hybrid Approaches: Combine learned components with structured controllers.

Key RL Algorithms

Several algorithms have proven effective for robot learning:

PPO (Proximal Policy Optimization): Stable learning through constrained policy updates. Widely used for continuous control.

SAC (Soft Actor-Critic): Encourages exploration while learning. Efficient use of collected experience.

TD3 (Twin Delayed DDPG): Addresses overestimation issues in continuous action spaces. Robust performance.

These algorithms balance exploration (trying new things) with exploitation (using what works).

Simulation to Reality Transfer

Training robots in the real world is slow, expensive, and potentially dangerous. Simulation offers an alternative.

The Simulation Approach

  1. Build a physics simulator that models the robot and environment
  2. Train the learning algorithm in simulation (millions of episodes)
  3. Transfer the learned policy to the real robot

Simulation enables:

  • Parallel training across many instances
  • Safe exploration of dangerous behaviors
  • Rapid iteration on learning algorithms
  • Access to perfect state information for training

The Reality Gap

Simulations never perfectly match reality. Differences arise from:

Physics Modeling: Contact dynamics, friction, and deformation are approximated Sensor Modeling: Real sensors have noise, latency, and artifacts Environment Modeling: Real-world materials and geometries vary

Policies that work in simulation may fail on real robots due to these gaps.

Domain Randomization

One solution randomizes simulation parameters during training:

  • Vary friction coefficients
  • Add sensor noise
  • Change object properties
  • Randomize visual appearance

The trained policy becomes robust to variations, including the differences between simulation and reality.

Domain Adaptation

Another approach explicitly adapts policies to the real world:

  • Fine-tune policies with limited real-world data
  • Learn mappings between simulated and real observations
  • Use real-world data to improve simulator accuracy

Imitation Learning

Instead of learning from scratch through trial and error, robots can learn from demonstrations.

Learning from Demonstration

Humans show the robot what to do:

Teleoperation: A human directly controls the robot to perform the task. Records actions paired with observations.

Kinesthetic Teaching: A human physically moves the robot's limbs through the desired motion. Intuitive but requires backdrivable hardware.

Observation: The robot watches a human perform the task. Must infer actions from visual observation.

Behavioral Cloning

The simplest imitation approach trains a policy to directly mimic demonstrated actions:

  1. Collect demonstrations (observation-action pairs)
  2. Train a policy to predict actions given observations
  3. Deploy the learned policy

Limitations:

  • The policy only sees states from demonstrations; new states may cause failures
  • Small errors compound over time, leading to divergence

Inverse Reinforcement Learning

Instead of copying actions, learn the underlying reward function:

  1. Observe expert demonstrations
  2. Infer what reward function the expert is optimizing
  3. Use standard RL to optimize that reward

This approach generalizes better to new situations but is more complex to implement.

Hybrid Approaches

Modern methods often combine imitation and reinforcement learning:

  • Use demonstrations to initialize or guide RL
  • Learn from both expert examples and trial-and-error
  • Use RL to improve beyond demonstrated performance

Skill Learning and Composition

Complex tasks require combining multiple skills.

Skill Libraries

Robots can learn libraries of reusable skills:

  • Pick: Grasp an object at a specified location
  • Place: Put a held object at a target position
  • Navigate: Move to a destination
  • Open: Manipulate doors, drawers, containers

Each skill is parameterized (where to pick, where to place) and can be invoked as needed.

Hierarchical Learning

Complex behaviors emerge from hierarchical organization:

High Level: Task planning—what skills to use and in what order Mid Level: Skill execution—parameterized motor programs Low Level: Motor control—joint torques and positions

Learning can occur at each level, with higher levels providing goals to lower levels.

Task and Motion Planning

Combining discrete task planning with continuous motion planning:

  1. Task planner sequences skills to achieve a goal
  2. Motion planner generates feasible trajectories for each skill
  3. Execution monitors progress and triggers replanning if needed

Learning improves both planning and execution over time.

Continual Learning

Robots must learn throughout their operational lifetime.

Avoiding Forgetting

A key challenge is catastrophic forgetting—learning new tasks can disrupt previously acquired skills. Solutions include:

Regularization: Constrain learning to preserve important parameters Replay: Periodically practice old tasks while learning new ones Modular Architectures: Separate network modules for different skills

Adaptation to Change

Environments and tasks evolve over time:

  • Objects are rearranged
  • New objects are introduced
  • User preferences change
  • Robot hardware ages

Continual learning systems detect changes and update their behaviors appropriately.

Multi-Task Learning

Learning multiple tasks simultaneously can improve efficiency:

  • Shared representations reduce total learning time
  • Skills transfer between related tasks
  • Negative transfer must be managed when tasks conflict

Practical Considerations

Data Efficiency

Real-world robot data is expensive. Techniques to maximize learning from limited data:

  • Leverage simulation for pre-training
  • Use data augmentation
  • Apply transfer learning from related tasks
  • Design sample-efficient algorithms

Safety During Learning

Exploration can be dangerous. Safety constraints include:

  • Physical limits on velocities and forces
  • Conservative initial policies
  • Human oversight during learning
  • Gradual expansion of explored regions

Evaluation and Validation

Assessing learned behaviors requires:

  • Testing on held-out scenarios
  • Measuring robustness to perturbations
  • Validating safety properties
  • Long-term reliability monitoring

Next: In Chapter 6, we'll explore real-world applications and the future of Physical AI—where these technologies are being deployed today and where they're heading.