Prioritized Trajectory Replay: A Replay Memory for Data-driven
Reinforcement Learning
- URL: http://arxiv.org/abs/2306.15503v1
- Date: Tue, 27 Jun 2023 14:29:44 GMT
- Title: Prioritized Trajectory Replay: A Replay Memory for Data-driven
Reinforcement Learning
- Authors: Jinyi Liu, Yi Ma, Jianye Hao, Yujing Hu, Yan Zheng, Tangjie Lv,
Changjie Fan
- Abstract summary: We propose a memory technique, (Prioritized) Trajectory Replay (TR/PTR), which extends the sampling perspective to trajectories.
TR enhances learning efficiency by backward sampling of trajectories that optimize the use of subsequent state information.
We demonstrate the benefits of integrating TR and PTR with existing offline RL algorithms on D4RL.
- Score: 52.49786369812919
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, data-driven reinforcement learning (RL), also known as
offline RL, have gained significant attention. However, the role of data
sampling techniques in offline RL has been overlooked despite its potential to
enhance online RL performance. Recent research suggests applying sampling
techniques directly to state-transitions does not consistently improve
performance in offline RL. Therefore, in this study, we propose a memory
technique, (Prioritized) Trajectory Replay (TR/PTR), which extends the sampling
perspective to trajectories for more comprehensive information extraction from
limited data. TR enhances learning efficiency by backward sampling of
trajectories that optimizes the use of subsequent state information. Building
on TR, we build the weighted critic target to avoid sampling unseen actions in
offline training, and Prioritized Trajectory Replay (PTR) that enables more
efficient trajectory sampling, prioritized by various trajectory priority
metrics. We demonstrate the benefits of integrating TR and PTR with existing
offline RL algorithms on D4RL. In summary, our research emphasizes the
significance of trajectory-based data sampling techniques in enhancing the
efficiency and performance of offline RL algorithms.
Related papers
- D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based
Trajectory Stitching [21.263554926053178]
In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets.
We introduce Diffusion-based Trajectory Stitching (DiffStitch), a novel diffusion-based data augmentation pipeline.
DiffStitch effectively connects low-reward trajectories with high-reward trajectories, forming globally optimal trajectories to address the challenges faced by offline RL algorithms.
arXiv Detail & Related papers (2024-02-04T10:30:23Z) - Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data.
We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset.
We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z) - Don't Change the Algorithm, Change the Data: Exploratory Data for
Offline Reinforcement Learning [147.61075994259807]
We propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL.
ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL.
We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks.
arXiv Detail & Related papers (2022-01-31T18:39:27Z) - POAR: Efficient Policy Optimization via Online Abstract State
Representation Learning [6.171331561029968]
State Representation Learning (SRL) is proposed to specifically learn to encode task-relevant features from complex sensory data into low-dimensional states.
We introduce a new SRL prior called domain resemblance to leverage expert demonstration to improve SRL interpretations.
We empirically verify POAR to efficiently handle tasks in high dimensions and facilitate training real-life robots directly from scratch.
arXiv Detail & Related papers (2021-09-17T16:52:03Z) - Behavioral Priors and Dynamics Models: Improving Performance and Domain
Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE)
MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary.
In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z) - Representation Matters: Offline Pretraining for Sequential Decision
Making [27.74988221252854]
In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making.
We find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms.
arXiv Detail & Related papers (2021-02-11T02:38:12Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.