Flow to Control: Offline Reinforcement Learning with Lossless Primitive
Discovery
- URL: http://arxiv.org/abs/2212.01105v1
- Date: Fri, 2 Dec 2022 11:35:51 GMT
- Title: Flow to Control: Offline Reinforcement Learning with Lossless Primitive
Discovery
- Authors: Yiqin Yang, Hao Hu, Wenzhe Li, Siyuan Li, Jun Yang, Qianchuan Zhao,
Chongjie Zhang
- Abstract summary: offline reinforcement learning (RL) enables the agent to effectively learn from logged data.
We show that our method has a good representation ability for policies and achieves superior performance in most tasks.
- Score: 31.49638957903016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning (RL) enables the agent to effectively learn
from logged data, which significantly extends the applicability of RL
algorithms in real-world scenarios where exploration can be expensive or
unsafe. Previous works have shown that extracting primitive skills from the
recurring and temporally extended structures in the logged data yields better
learning. However, these methods suffer greatly when the primitives have
limited representation ability to recover the original policy space, especially
in offline settings. In this paper, we give a quantitative characterization of
the performance of offline hierarchical learning and highlight the importance
of learning lossless primitives. To this end, we propose to use a
\emph{flow}-based structure as the representation for low-level policies. This
allows us to represent the behaviors in the dataset faithfully while keeping
the expression ability to recover the whole policy space. We show that such
lossless primitives can drastically improve the performance of hierarchical
policies. The experimental results and extensive ablation studies on the
standard D4RL benchmark show that our method has a good representation ability
for policies and achieves superior performance in most tasks.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - Is Value Learning Really the Main Bottleneck in Offline RL? [70.54708989409409]
We show that the choice of a policy extraction algorithm significantly affects the performance and scalability of offline RL.
We propose two simple test-time policy improvement methods and show that these methods lead to better performance.
arXiv Detail & Related papers (2024-06-13T17:07:49Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets.
We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged.
We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z) - A Policy-Guided Imitation Approach for Offline Reinforcement Learning [9.195775740684248]
We introduce Policy-guided Offline RL (textttPOR)
textttPOR demonstrates the state-of-the-art performance on D4RL, a standard benchmark for offline RL.
arXiv Detail & Related papers (2022-10-15T15:54:28Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Representation Matters: Offline Pretraining for Sequential Decision
Making [27.74988221252854]
In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making.
We find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms.
arXiv Detail & Related papers (2021-02-11T02:38:12Z) - OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement
Learning [107.6943868812716]
In many practical applications, the situation is reversed: an agent may have access to large amounts of undirected offline experience data, while access to the online environment is severely limited.
Our main insight is that, when presented with offline data composed of a variety of behaviors, an effective way to leverage this data is to extract a continuous space of recurring and temporally extended primitive behaviors.
In addition to benefiting offline policy optimization, we show that performing offline primitive learning in this way can also be leveraged for improving few-shot imitation learning.
arXiv Detail & Related papers (2020-10-26T14:31:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.