Watch and Match: Supercharging Imitation with Regularized Optimal
Transport
- URL: http://arxiv.org/abs/2206.15469v1
- Date: Thu, 30 Jun 2022 17:58:18 GMT
- Title: Watch and Match: Supercharging Imitation with Regularized Optimal
Transport
- Authors: Siddhant Haldar and Vaibhav Mathur and Denis Yarats and Lerrel Pinto
- Abstract summary: Regularized Optimal Transport (ROT) is a new imitation learning algorithm that builds on recent advances in optimal transport based trajectory-matching.
Our experiments on 20 visual control tasks across the DeepMind Control Suite, the OpenAI Robotics Suite, and the Meta-World Benchmark demonstrate an average of 7.8X faster imitation to reach 90% of expert performance.
- Score: 28.3572924961148
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Imitation learning holds tremendous promise in learning policies efficiently
for complex decision making problems. Current state-of-the-art algorithms often
use inverse reinforcement learning (IRL), where given a set of expert
demonstrations, an agent alternatively infers a reward function and the
associated optimal policy. However, such IRL approaches often require
substantial online interactions for complex control problems. In this work, we
present Regularized Optimal Transport (ROT), a new imitation learning algorithm
that builds on recent advances in optimal transport based trajectory-matching.
Our key technical insight is that adaptively combining trajectory-matching
rewards with behavior cloning can significantly accelerate imitation even with
only a few demonstrations. Our experiments on 20 visual control tasks across
the DeepMind Control Suite, the OpenAI Robotics Suite, and the Meta-World
Benchmark demonstrate an average of 7.8X faster imitation to reach 90% of
expert performance compared to prior state-of-the-art methods. On real-world
robotic manipulation, with just one demonstration and an hour of online
training, ROT achieves an average success rate of 90.1% across 14 tasks.
Related papers
- Next-Future: Sample-Efficient Policy Learning for Robotic-Arm Tasks [6.991281327290525]
We introduce a novel replay strategy, "Next-Future", which focuses on rewarding single-step transitions.
This approach significantly enhances sample efficiency and accuracy in learning multi-goal Markov decision processes.
arXiv Detail & Related papers (2025-04-15T14:45:51Z) - Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning [47.785786984974855]
We present a human-in-the-loop vision-based RL system that demonstrates impressive performance on a diverse set of dexterous manipulation tasks.
Our approach integrates demonstrations and human corrections, efficient RL algorithms, and other system-level design choices to learn policies.
We show that our method significantly outperforms imitation learning baselines and prior RL approaches, with an average 2x improvement in success rate and 1.8x faster execution.
arXiv Detail & Related papers (2024-10-29T08:12:20Z) - Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills.
We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval.
GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - ReProHRL: Towards Multi-Goal Navigation in the Real World using
Hierarchical Agents [1.3194749469702445]
We present Ready for Production Hierarchical RL (ReProHRL) that divides tasks with hierarchical multi-goal navigation guided by reinforcement learning.
We also use object detectors as a pre-processing step to learn multi-goal navigation and transfer it to the real world.
For the real-world implementation and proof of concept demonstration, we deploy the proposed method on a nano-drone named Crazyflie with a front camera.
arXiv Detail & Related papers (2023-08-17T02:23:59Z) - Vision-Based Autonomous Car Racing Using Deep Imitative Reinforcement
Learning [13.699336307578488]
Deep imitative reinforcement learning approach (DIRL) achieves agile autonomous racing using visual inputs.
We validate our algorithm both in a high-fidelity driving simulation and on a real-world 1/20-scale RC-car with limited onboard computation.
arXiv Detail & Related papers (2021-07-18T00:00:48Z) - A Framework for Efficient Robotic Manipulation [79.10407063260473]
We show that a single robotic arm can learn sparse-reward manipulation policies from pixels.
We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels.
arXiv Detail & Related papers (2020-12-14T22:18:39Z) - Reinforcement Learning Experiments and Benchmark for Solving Robotic
Reaching Tasks [0.0]
Reinforcement learning has been successfully applied to solving the reaching task with robotic arms.
It is shown that augmenting the reward signal with the Hindsight Experience Replay exploration technique increases the average return of off-policy agents.
arXiv Detail & Related papers (2020-11-11T14:00:49Z) - Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z) - Assembly robots with optimized control stiffness through reinforcement
learning [3.4410212782758047]
We propose a methodology that uses reinforcement learning to achieve high performance in robots.
The proposed method ensures the online generation of stiffness matrices that help improve the performance of local trajectory optimization.
The effectiveness of the method was verified via experiments involving two contact-rich tasks.
arXiv Detail & Related papers (2020-02-27T15:54:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.