CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2306.13412v2
- Date: Sun, 15 Oct 2023 14:55:47 GMT
- Title: CLUE: Calibrated Latent Guidance for Offline Reinforcement Learning
- Authors: Jinxin Liu, Lipeng Zu, Li He, Donglin Wang
- Abstract summary: We introduce textbfCalibrated textbfLatent gtextbfUidanctextbfE (CLUE), which utilizes a conditional variational auto-encoder to learn a latent space.
We instantiate the expert-driven intrinsic rewards in sparse-reward offline RL tasks, offline imitation learning (IL) tasks, and unsupervised offline RL tasks.
- Score: 31.49713012907863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning (RL) aims to learn an optimal policy from
pre-collected and labeled datasets, which eliminates the time-consuming data
collection in online RL. However, offline RL still bears a large burden of
specifying/handcrafting extrinsic rewards for each transition in the offline
data. As a remedy for the labor-intensive labeling, we propose to endow offline
RL tasks with a few expert data and utilize the limited expert data to drive
intrinsic rewards, thus eliminating the need for extrinsic rewards. To achieve
that, we introduce \textbf{C}alibrated \textbf{L}atent
g\textbf{U}idanc\textbf{E} (CLUE), which utilizes a conditional variational
auto-encoder to learn a latent space such that intrinsic rewards can be
directly qualified over the latent space. CLUE's key idea is to align the
intrinsic rewards consistent with the expert intention via enforcing the
embeddings of expert data to a calibrated contextual representation. We
instantiate the expert-driven intrinsic rewards in sparse-reward offline RL
tasks, offline imitation learning (IL) tasks, and unsupervised offline RL
tasks. Empirically, we find that CLUE can effectively improve the sparse-reward
offline RL performance, outperform the state-of-the-art offline IL baselines,
and discover diverse skills from static reward-free offline data.
Related papers
- SEABO: A Simple Search-Based Method for Offline Imitation Learning [57.2723889718596]
offline reinforcement learning (RL) has attracted much attention due to its ability in learning from static offline datasets.
We propose a simple yet effective search-based offline IL method, tagged SEABO.
We show that SEABO can achieve competitive performance to offline RL algorithms with ground-truth rewards, given only a single expert trajectory.
arXiv Detail & Related papers (2024-02-06T08:48:01Z) - Survival Instinct in Offline Reinforcement Learning [28.319886852612672]
offline RL can produce well-optimal and safe policies even when trained with "wrong" reward labels.
We demonstrate that this surprising property is attributable to an interplay between the notion of pessimism in offline RL algorithms and certain implicit biases in common data collection practices.
Our empirical and theoretical results suggest a new paradigm for RL, whereby an agent is nudged to learn a desirable behavior with imperfect reward but purposely biased data coverage.
arXiv Detail & Related papers (2023-06-05T22:15:39Z) - Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid
Reinforcement Learning [66.43003402281659]
A central question boils down to how to efficiently utilize online data collection to strengthen and complement the offline dataset.
We design a three-stage hybrid RL algorithm that beats the best of both worlds -- pure offline RL and pure online RL.
The proposed algorithm does not require any reward information during data collection.
arXiv Detail & Related papers (2023-05-17T15:17:23Z) - The Provable Benefits of Unsupervised Data Sharing for Offline
Reinforcement Learning [25.647624787936028]
We propose a novel, Provable Data Sharing algorithm (PDS) to utilize reward-free data for offline reinforcement learning.
PDS significantly improves the performance of offline RL algorithms with reward-free data.
arXiv Detail & Related papers (2023-02-27T03:35:02Z) - Benchmarks and Algorithms for Offline Preference-Based Reward Learning [41.676208473752425]
We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning.
Our proposed approach does not require actual physical rollouts or an accurate simulator for either the reward learning or policy optimization steps.
arXiv Detail & Related papers (2023-01-03T23:52:16Z) - Don't Change the Algorithm, Change the Data: Exploratory Data for
Offline Reinforcement Learning [147.61075994259807]
We propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL.
ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL.
We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks.
arXiv Detail & Related papers (2022-01-31T18:39:27Z) - Offline Meta-Reinforcement Learning with Online Self-Supervision [66.42016534065276]
We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy.
Our method uses the offline data to learn the distribution of reward functions, which is then sampled to self-supervise reward labels for the additional online data.
We find that using additional data and self-generated rewards significantly improves an agent's ability to generalize.
arXiv Detail & Related papers (2021-07-08T17:01:32Z) - Representation Matters: Offline Pretraining for Sequential Decision
Making [27.74988221252854]
In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making.
We find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms.
arXiv Detail & Related papers (2021-02-11T02:38:12Z) - OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement
Learning [107.6943868812716]
In many practical applications, the situation is reversed: an agent may have access to large amounts of undirected offline experience data, while access to the online environment is severely limited.
Our main insight is that, when presented with offline data composed of a variety of behaviors, an effective way to leverage this data is to extract a continuous space of recurring and temporally extended primitive behaviors.
In addition to benefiting offline policy optimization, we show that performing offline primitive learning in this way can also be leveraged for improving few-shot imitation learning.
arXiv Detail & Related papers (2020-10-26T14:31:08Z) - D4RL: Datasets for Deep Data-Driven Reinforcement Learning [119.49182500071288]
We introduce benchmarks specifically designed for the offline setting, guided by key properties of datasets relevant to real-world applications of offline RL.
By moving beyond simple benchmark tasks and data collected by partially-trained RL agents, we reveal important and unappreciated deficiencies of existing algorithms.
arXiv Detail & Related papers (2020-04-15T17:18:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.