Optimal Transport for Offline Imitation Learning
- URL: http://arxiv.org/abs/2303.13971v1
- Date: Fri, 24 Mar 2023 12:45:42 GMT
- Title: Optimal Transport for Offline Imitation Learning
- Authors: Yicheng Luo, Zhengyao Jiang, Samuel Cohen, Edward Grefenstette, Marc
Peter Deisenroth
- Abstract summary: offline reinforcement learning (RL) is a promising framework for learning good decision-making policies without the need to interact with the real environment.
We introduce Optimal Transport Reward labeling (OTR), an algorithm that assigns rewards to offline trajectories.
We show that OTR with a single demonstration can consistently match the performance of offline RL with ground-truth rewards.
- Score: 31.218468923400373
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the advent of large datasets, offline reinforcement learning (RL) is a
promising framework for learning good decision-making policies without the need
to interact with the real environment. However, offline RL requires the dataset
to be reward-annotated, which presents practical challenges when reward
engineering is difficult or when obtaining reward annotations is
labor-intensive. In this paper, we introduce Optimal Transport Reward labeling
(OTR), an algorithm that assigns rewards to offline trajectories, with a few
high-quality demonstrations. OTR's key idea is to use optimal transport to
compute an optimal alignment between an unlabeled trajectory in the dataset and
an expert demonstration to obtain a similarity measure that can be interpreted
as a reward, which can then be used by an offline RL algorithm to learn the
policy. OTR is easy to implement and computationally efficient. On D4RL
benchmarks, we show that OTR with a single demonstration can consistently match
the performance of offline RL with ground-truth rewards.
Related papers
- Real-World Offline Reinforcement Learning from Vision Language Model Feedback [19.494335952082466]
offline reinforcement learning can enable policy learning from pre-collected, sub-optimal datasets without online interactions.
Most existing offline RL works assume the dataset is already labeled with the task rewards.
We propose a novel system that automatically generates reward labels for offline datasets.
arXiv Detail & Related papers (2024-11-08T02:12:34Z) - D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Preference Elicitation for Offline Reinforcement Learning [59.136381500967744]
We propose Sim-OPRL, an offline preference-based reinforcement learning algorithm.
Our algorithm employs a pessimistic approach for out-of-distribution data, and an optimistic approach for acquiring informative preferences about the optimal policy.
arXiv Detail & Related papers (2024-06-26T15:59:13Z) - Align Your Intents: Offline Imitation Learning via Optimal Transport [3.1728695158666396]
We show that an imitating agent can still learn the desired behavior merely from observing the expert.
In our method, AILOT, we involve special representation of states in a form of intents that incorporate pairwise spatial distances within the data.
We report that AILOT outperforms state-of-the art offline imitation learning algorithms on D4RL benchmarks.
arXiv Detail & Related papers (2024-02-20T14:24:00Z) - Leveraging Optimal Transport for Enhanced Offline Reinforcement Learning
in Surgical Robotic Environments [4.2569494803130565]
We introduce an innovative algorithm designed to assign rewards to offline trajectories, using a small number of high-quality expert demonstrations.
This approach circumvents the need for handcrafted rewards, unlocking the potential to harness vast datasets for policy learning.
arXiv Detail & Related papers (2023-10-13T03:39:15Z) - Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data.
We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset.
We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - Benchmarks and Algorithms for Offline Preference-Based Reward Learning [41.676208473752425]
We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning.
Our proposed approach does not require actual physical rollouts or an accurate simulator for either the reward learning or policy optimization steps.
arXiv Detail & Related papers (2023-01-03T23:52:16Z) - Reinforcement Learning with Sparse Rewards using Guidance from Offline
Demonstration [9.017416068706579]
A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback.
We develop an algorithm that exploits the offline demonstration data generated by a sub-optimal behavior policy.
We demonstrate the superior performance of our algorithm over state-of-the-art approaches.
arXiv Detail & Related papers (2022-02-09T18:45:40Z) - Offline Meta-Reinforcement Learning with Online Self-Supervision [66.42016534065276]
We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy.
Our method uses the offline data to learn the distribution of reward functions, which is then sampled to self-supervise reward labels for the additional online data.
We find that using additional data and self-generated rewards significantly improves an agent's ability to generalize.
arXiv Detail & Related papers (2021-07-08T17:01:32Z) - Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.