Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay
- URL: http://arxiv.org/abs/2404.10662v2
- Date: Thu, 18 Apr 2024 04:49:02 GMT
- Title: Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay
- Authors: Jinmei Liu, Wenbin Li, Xiangyu Yue, Shilin Zhang, Chunlin Chen, Zhi Wang,
- Abstract summary: We study a practical paradigm that facilitates forward transfer and mitigates catastrophic forgetting to tackle sequential offline tasks.
We propose a dual generative replay framework that retains previous knowledge by concurrent replay of generated pseudo-data.
- Score: 16.269591842495892
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study continual offline reinforcement learning, a practical paradigm that facilitates forward transfer and mitigates catastrophic forgetting to tackle sequential offline tasks. We propose a dual generative replay framework that retains previous knowledge by concurrent replay of generated pseudo-data. First, we decouple the continual learning policy into a diffusion-based generative behavior model and a multi-head action evaluation model, allowing the policy to inherit distributional expressivity for encompassing a progressive range of diverse behaviors. Second, we train a task-conditioned diffusion model to mimic state distributions of past tasks. Generated states are paired with corresponding responses from the behavior generator to represent old tasks with high-fidelity replayed samples. Finally, by interleaving pseudo samples with real ones of the new task, we continually update the state and behavior generators to model progressively diverse behaviors, and regularize the multi-head critic via behavior cloning to mitigate forgetting. Experiments demonstrate that our method achieves better forward transfer with less forgetting, and closely approximates the results of using previous ground-truth data due to its high-fidelity replay of the sample space. Our code is available at \href{https://github.com/NJU-RL/CuGRO}{https://github.com/NJU-RL/CuGRO}.
Related papers
- Stable Continual Reinforcement Learning via Diffusion-based Trajectory Replay [28.033367285923465]
Reinforcement Learning (RL) aims to equip the agent with the capability to address a series of sequentially presented decision-making tasks.
This paper introduces a novel continual RL algorithm DISTR that employs a diffusion model to memorize the high-return trajectory distribution of each encountered task.
Considering the impracticality of replaying all past data each time, a prioritization mechanism is proposed to prioritize the trajectory replay of pivotal tasks.
arXiv Detail & Related papers (2024-11-16T14:03:23Z) - Prioritized Generative Replay [121.83947140497655]
We propose a prioritized, parametric version of an agent's memory, using generative models to capture online experience.
This paradigm enables densification of past experience, with new generations that benefit from the generative model's generalization capacity.
We show this recipe can be instantiated using conditional diffusion models and simple relevance functions.
arXiv Detail & Related papers (2024-10-23T17:59:52Z) - Diffusing States and Matching Scores: A New Framework for Imitation Learning [16.941612670582522]
Adversarial Imitation Learning is traditionally framed as a two-player zero-sum game between a learner and an adversarially chosen cost function.
In recent years, diffusion models have emerged as a non-adversarial alternative to GANs.
We show our approach outperforms GAN-style imitation learning baselines across various continuous control problems.
arXiv Detail & Related papers (2024-10-17T17:59:25Z) - Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal [54.93261535899478]
In real-world applications, such as robotic control of reinforcement learning, the tasks are changing, and new tasks arise in a sequential order.
This situation poses the new challenge of plasticity-stability trade-off for training an agent who can adapt to task changes and retain acquired knowledge.
We propose a rehearsal-based continual diffusion model, called Continual diffuser (CoD), to endow the diffuser with the capabilities of quick adaptation (plasticity) and lasting retention (stability)
arXiv Detail & Related papers (2024-09-04T08:21:47Z) - Enhancing Consistency and Mitigating Bias: A Data Replay Approach for
Incremental Learning [100.7407460674153]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks.
To mitigate the problem, a line of methods propose to replay the data of experienced tasks when learning new tasks.
However, it is not expected in practice considering the memory constraint or data privacy issue.
As a replacement, data-free data replay methods are proposed by inverting samples from the classification model.
arXiv Detail & Related papers (2024-01-12T12:51:12Z) - Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy.
At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z) - Connective Reconstruction-based Novelty Detection [3.7706789983985303]
Deep learning has enabled us to analyze real-world data which contain unexplained samples.
GAN-based approaches have been widely used to address this problem due to their ability to perform distribution fitting.
We propose a simple yet efficient reconstruction-based method that avoids adding complexities to compensate for the limitations of GAN models.
arXiv Detail & Related papers (2022-10-25T11:09:39Z) - Prompt Conditioned VAE: Enhancing Generative Replay for Lifelong
Learning in Task-Oriented Dialogue [80.05509768165135]
generative replay methods are widely employed to consolidate past knowledge with generated pseudo samples.
Most existing generative replay methods use only a single task-specific token to control their models.
We propose a novel method, prompt conditioned VAE for lifelong learning, to enhance generative replay by incorporating tasks' statistics.
arXiv Detail & Related papers (2022-10-14T13:12:14Z) - Outcome-Guided Counterfactuals for Reinforcement Learning Agents from a
Jointly Trained Generative Latent Space [0.0]
We present a novel generative method for producing unseen and plausible counterfactual examples for reinforcement learning (RL) agents.
Our approach uses a variational autoencoder to train a latent space that jointly encodes information about the observations and outcome variables pertaining to an agent's behavior.
arXiv Detail & Related papers (2022-07-15T19:09:54Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z) - Generative Feature Replay with Orthogonal Weight Modification for
Continual Learning [20.8966035274874]
generative replay is a promising strategy which generates and replays pseudo data for previous tasks to alleviate catastrophic forgetting.
We propose to replay penultimate layer feature with a generative model; 2) leverage a self-supervised auxiliary task to further enhance the stability of feature.
Empirical results on several datasets show our method always achieves substantial improvement over powerful OWM.
arXiv Detail & Related papers (2020-05-07T13:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.