SEABO: A Simple Search-Based Method for Offline Imitation Learning
- URL: http://arxiv.org/abs/2402.03807v2
- Date: Wed, 21 Feb 2024 05:24:37 GMT
- Title: SEABO: A Simple Search-Based Method for Offline Imitation Learning
- Authors: Jiafei Lyu, Xiaoteng Ma, Le Wan, Runze Liu, Xiu Li, Zongqing Lu
- Abstract summary: offline reinforcement learning (RL) has attracted much attention due to its ability in learning from static offline datasets.
We propose a simple yet effective search-based offline IL method, tagged SEABO.
We show that SEABO can achieve competitive performance to offline RL algorithms with ground-truth rewards, given only a single expert trajectory.
- Score: 57.2723889718596
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Offline reinforcement learning (RL) has attracted much attention due to its
ability in learning from static offline datasets and eliminating the need of
interacting with the environment. Nevertheless, the success of offline RL
relies heavily on the offline transitions annotated with reward labels. In
practice, we often need to hand-craft the reward function, which is sometimes
difficult, labor-intensive, or inefficient. To tackle this challenge, we set
our focus on the offline imitation learning (IL) setting, and aim at getting a
reward function based on the expert data and unlabeled data. To that end, we
propose a simple yet effective search-based offline IL method, tagged SEABO.
SEABO allocates a larger reward to the transition that is close to its closest
neighbor in the expert demonstration, and a smaller reward otherwise, all in an
unsupervised learning manner. Experimental results on a variety of D4RL
datasets indicate that SEABO can achieve competitive performance to offline RL
algorithms with ground-truth rewards, given only a single expert trajectory,
and can outperform prior reward learning and offline IL methods across many
tasks. Moreover, we demonstrate that SEABO also works well if the expert
demonstrations contain only observations. Our code is publicly available at
https://github.com/dmksjfl/SEABO.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - Align Your Intents: Offline Imitation Learning via Optimal Transport [3.1728695158666396]
We show that an imitating agent can still learn the desired behavior merely from observing the expert.
In our method, AILOT, we involve special representation of states in a form of intents that incorporate pairwise spatial distances within the data.
We report that AILOT outperforms state-of-the art offline imitation learning algorithms on D4RL benchmarks.
arXiv Detail & Related papers (2024-02-20T14:24:00Z) - CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning [31.49713012907863]
We introduce textbfCalibrated textbfLatent gtextbfUidanctextbfE (CLUE), which utilizes a conditional variational auto-encoder to learn a latent space.
We instantiate the expert-driven intrinsic rewards in sparse-reward offline RL tasks, offline imitation learning (IL) tasks, and unsupervised offline RL tasks.
arXiv Detail & Related papers (2023-06-23T09:57:50Z) - Optimal Transport for Offline Imitation Learning [31.218468923400373]
offline reinforcement learning (RL) is a promising framework for learning good decision-making policies without the need to interact with the real environment.
We introduce Optimal Transport Reward labeling (OTR), an algorithm that assigns rewards to offline trajectories.
We show that OTR with a single demonstration can consistently match the performance of offline RL with ground-truth rewards.
arXiv Detail & Related papers (2023-03-24T12:45:42Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - Benchmarks and Algorithms for Offline Preference-Based Reward Learning [41.676208473752425]
We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning.
Our proposed approach does not require actual physical rollouts or an accurate simulator for either the reward learning or policy optimization steps.
arXiv Detail & Related papers (2023-01-03T23:52:16Z) - RvS: What is Essential for Offline RL via Supervised Learning? [77.91045677562802]
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL.
In every environment suite we consider simply maximizing likelihood with two-layer feedforward is competitive.
They also probe the limits of existing RvS methods, which are comparatively weak on random data.
arXiv Detail & Related papers (2021-12-20T18:55:16Z) - Offline Meta-Reinforcement Learning with Online Self-Supervision [66.42016534065276]
We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy.
Our method uses the offline data to learn the distribution of reward functions, which is then sampled to self-supervise reward labels for the additional online data.
We find that using additional data and self-generated rewards significantly improves an agent's ability to generalize.
arXiv Detail & Related papers (2021-07-08T17:01:32Z) - OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement
Learning [107.6943868812716]
In many practical applications, the situation is reversed: an agent may have access to large amounts of undirected offline experience data, while access to the online environment is severely limited.
Our main insight is that, when presented with offline data composed of a variety of behaviors, an effective way to leverage this data is to extract a continuous space of recurring and temporally extended primitive behaviors.
In addition to benefiting offline policy optimization, we show that performing offline primitive learning in this way can also be leveraged for improving few-shot imitation learning.
arXiv Detail & Related papers (2020-10-26T14:31:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.