Distilling Conditional Diffusion Models for Offline Reinforcement
Learning through Trajectory Stitching
- URL: http://arxiv.org/abs/2402.00807v1
- Date: Thu, 1 Feb 2024 17:44:11 GMT
- Title: Distilling Conditional Diffusion Models for Offline Reinforcement
Learning through Trajectory Stitching
- Authors: Shangzhe Li and Xinhua Zhang
- Abstract summary: We propose a knowledge distillation method based on data augmentation.
High-return trajectories are generated from a conditional diffusion model, and they are blended with the original trajectories through a novel stitching algorithm.
Applying the resulting dataset to behavioral cloning, the learned shallow policy whose size is much smaller outperforms or nearly matches deep generative planners on several D4RL benchmarks.
- Score: 14.295558685860941
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep generative models have recently emerged as an effective approach to
offline reinforcement learning. However, their large model size poses
challenges in computation. We address this issue by proposing a knowledge
distillation method based on data augmentation. In particular, high-return
trajectories are generated from a conditional diffusion model, and they are
blended with the original trajectories through a novel stitching algorithm that
leverages a new reward generator. Applying the resulting dataset to behavioral
cloning, the learned shallow policy whose size is much smaller outperforms or
nearly matches deep generative planners on several D4RL benchmarks.
Related papers
- Hybrid Reinforcement Learning from Offline Observation Alone [19.14864618744221]
We consider the hybrid reinforcement learning setting where the agent has access to both offline data and online interactive access.
We propose the first algorithm in the trace model setting that provably matches the performance of algorithms that leverage a reset model.
arXiv Detail & Related papers (2024-06-11T13:34:05Z) - ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories [27.5648276335047]
Training autonomous agents with sparse rewards is a long-standing problem in online reinforcement learning (RL)
We propose a novel approach that leverages offline data to learn a generative diffusion model, coined as Adaptive Trajectory diffuser (ATraDiff)
ATraDiff consistently achieves state-of-the-art performance across a variety of environments, with particularly pronounced improvements in complicated settings.
arXiv Detail & Related papers (2024-06-06T17:58:15Z) - Causal Action Influence Aware Counterfactual Data Augmentation [23.949113120847507]
We propose CAIAC, a data augmentation method that can create synthetic transitions from a fixed dataset without having access to online environment interactions.
By utilizing principled methods for quantifying causal influence, we are able to perform counterfactual reasoning by swapping $itaction$-unaffected parts of the state-space.
This leads to a substantial increase in robustness of offline learning algorithms against distributional shift.
arXiv Detail & Related papers (2024-05-29T09:19:50Z) - Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline
Pre-Training with Model Based Augmentation [59.899714450049494]
offline pre-training can produce sub-optimal policies and lead to degraded online reinforcement learning performance.
We propose a model-based data augmentation strategy to maximize the benefits of offline reinforcement learning pre-training and reduce the scale of data needed to be effective.
arXiv Detail & Related papers (2023-12-15T14:49:41Z) - Causal Decision Transformer for Recommender Systems via Offline
Reinforcement Learning [23.638418776700522]
We propose a new model named the causal decision transformer for recommender systems (CDT4Rec)
CDT4Rec is an offline reinforcement learning system that can learn from a dataset rather than from online interaction.
To demonstrate the feasibility and superiority of our model, we have conducted experiments on six real-world offline datasets and one online simulator.
arXiv Detail & Related papers (2023-04-17T00:05:52Z) - Synthetic Experience Replay [48.601879260071655]
We propose Synthetic Experience Replay ( SynthER), a diffusion-based approach to flexibly upsample an agent's collected experience.
We show that SynthER is an effective method for training RL agents across offline and online settings.
We believe that synthetic training data could open the door to realizing the full potential of deep learning for replay-based RL algorithms from limited data.
arXiv Detail & Related papers (2023-03-12T09:10:45Z) - Beyond Transfer Learning: Co-finetuning for Action Localisation [64.07196901012153]
We propose co-finetuning -- simultaneously training a single model on multiple upstream'' and downstream'' tasks.
We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data.
We also show how we can easily extend our approach to multiple upstream'' datasets to further improve performance.
arXiv Detail & Related papers (2022-07-08T10:25:47Z) - PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for
Reinforcement Learning [84.30765628008207]
We propose a novel method, dubbed PlayVirtual, which augments cycle-consistent virtual trajectories to enhance the data efficiency for RL feature representation learning.
Our method outperforms the current state-of-the-art methods by a large margin on both benchmarks.
arXiv Detail & Related papers (2021-06-08T07:37:37Z) - OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement
Learning [107.6943868812716]
In many practical applications, the situation is reversed: an agent may have access to large amounts of undirected offline experience data, while access to the online environment is severely limited.
Our main insight is that, when presented with offline data composed of a variety of behaviors, an effective way to leverage this data is to extract a continuous space of recurring and temporally extended primitive behaviors.
In addition to benefiting offline policy optimization, we show that performing offline primitive learning in this way can also be leveraged for improving few-shot imitation learning.
arXiv Detail & Related papers (2020-10-26T14:31:08Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.