DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based
Trajectory Stitching
- URL: http://arxiv.org/abs/2402.02439v2
- Date: Thu, 22 Feb 2024 00:05:12 GMT
- Title: DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based
Trajectory Stitching
- Authors: Guanghe Li, Yixiang Shan, Zhengbang Zhu, Ting Long, Weinan Zhang
- Abstract summary: In offline reinforcement learning (RL), the performance of the learned policy highly depends on the quality of offline datasets.
We introduce Diffusion-based Trajectory Stitching (DiffStitch), a novel diffusion-based data augmentation pipeline.
DiffStitch effectively connects low-reward trajectories with high-reward trajectories, forming globally optimal trajectories to address the challenges faced by offline RL algorithms.
- Score: 21.263554926053178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In offline reinforcement learning (RL), the performance of the learned policy
highly depends on the quality of offline datasets. However, in many cases, the
offline dataset contains very limited optimal trajectories, which poses a
challenge for offline RL algorithms as agents must acquire the ability to
transit to high-reward regions. To address this issue, we introduce
Diffusion-based Trajectory Stitching (DiffStitch), a novel diffusion-based data
augmentation pipeline that systematically generates stitching transitions
between trajectories. DiffStitch effectively connects low-reward trajectories
with high-reward trajectories, forming globally optimal trajectories to address
the challenges faced by offline RL algorithms. Empirical experiments conducted
on D4RL datasets demonstrate the effectiveness of DiffStitch across RL
methodologies. Notably, DiffStitch demonstrates substantial enhancements in the
performance of one-step methods (IQL), imitation learning methods (TD3+BC), and
trajectory optimization methods (DT).
Related papers
- Enhancing Sample Efficiency and Exploration in Reinforcement Learning through the Integration of Diffusion Models and Proximal Policy Optimization [1.631115063641726]
We propose a framework that enhances PPO algorithms by incorporating a diffusion model to generate high-quality virtual trajectories for offline datasets.
Our contributions are threefold: we explore the potential of diffusion models in RL, particularly for offline datasets, extend the application of online RL to offline environments, and experimentally validate the performance improvements of PPO with diffusion models.
arXiv Detail & Related papers (2024-09-02T19:10:32Z) - D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling [34.547551367941246]
Real-world data collected from sensors or humans often contains noise and errors.
Traditional offline RL methods based on temporal difference learning tend to underperform Decision Transformer (DT) under data corruption.
We propose Robust Decision Transformer (RDT) by incorporating several robust techniques.
arXiv Detail & Related papers (2024-07-05T06:34:32Z) - ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories [27.5648276335047]
Training autonomous agents with sparse rewards is a long-standing problem in online reinforcement learning (RL)
We propose a novel approach that leverages offline data to learn a generative diffusion model, coined as Adaptive Trajectory diffuser (ATraDiff)
ATraDiff consistently achieves state-of-the-art performance across a variety of environments, with particularly pronounced improvements in complicated settings.
arXiv Detail & Related papers (2024-06-06T17:58:15Z) - Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data.
We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset.
We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z) - Prioritized Trajectory Replay: A Replay Memory for Data-driven
Reinforcement Learning [52.49786369812919]
We propose a memory technique, (Prioritized) Trajectory Replay (TR/PTR), which extends the sampling perspective to trajectories.
TR enhances learning efficiency by backward sampling of trajectories that optimize the use of subsequent state information.
We demonstrate the benefits of integrating TR and PTR with existing offline RL algorithms on D4RL.
arXiv Detail & Related papers (2023-06-27T14:29:44Z) - Efficient Diffusion Policies for Offline Reinforcement Learning [85.73757789282212]
Diffsuion-QL significantly boosts the performance of offline RL by representing a policy with a diffusion model.
We propose efficient diffusion policy (EDP) to overcome these two challenges.
EDP constructs actions from corrupted ones at training to avoid running the sampling chain.
arXiv Detail & Related papers (2023-05-31T17:55:21Z) - Semi-Supervised Offline Reinforcement Learning with Action-Free
Trajectories [37.14064734165109]
Natural agents can learn from multiple data sources that differ in size, quality, and types of measurements.
We study this in the context of offline reinforcement learning (RL) by introducing a new, practically motivated semi-supervised setting.
arXiv Detail & Related papers (2022-10-12T18:22:23Z) - Don't Change the Algorithm, Change the Data: Exploratory Data for
Offline Reinforcement Learning [147.61075994259807]
We propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL.
ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL.
We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks.
arXiv Detail & Related papers (2022-01-31T18:39:27Z) - Behavioral Priors and Dynamics Models: Improving Performance and Domain
Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE)
MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary.
In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.