P2DT: Mitigating Forgetting in task-incremental Learning with
progressive prompt Decision Transformer
- URL: http://arxiv.org/abs/2401.11666v1
- Date: Mon, 22 Jan 2024 02:58:53 GMT
- Title: P2DT: Mitigating Forgetting in task-incremental Learning with
progressive prompt Decision Transformer
- Authors: Zhiyuan Wang, Xiaoyang Qu, Jing Xiao, Bokui Chen, Jianzong Wang
- Abstract summary: Catastrophic forgetting poses a substantial challenge for managing intelligent agents controlled by a large model.
We propose a novel solution - the Progressive Prompt Decision Transformer (P2DT)
This method enhances a transformer-based model by dynamically appending decision tokens during new task training, thus fostering task-specific policies.
- Score: 39.16560969128012
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Catastrophic forgetting poses a substantial challenge for managing
intelligent agents controlled by a large model, causing performance degradation
when these agents face new tasks. In our work, we propose a novel solution -
the Progressive Prompt Decision Transformer (P2DT). This method enhances a
transformer-based model by dynamically appending decision tokens during new
task training, thus fostering task-specific policies. Our approach mitigates
forgetting in continual and offline reinforcement learning scenarios. Moreover,
P2DT leverages trajectories collected via traditional reinforcement learning
from all tasks and generates new task-specific tokens during training, thereby
retaining knowledge from previous studies. Preliminary results demonstrate that
our model effectively alleviates catastrophic forgetting and scales well with
increasing task environments.
Related papers
- Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal [54.93261535899478]
In real-world applications, such as robotic control of reinforcement learning, the tasks are changing, and new tasks arise in a sequential order.
This situation poses the new challenge of plasticity-stability trade-off for training an agent who can adapt to task changes and retain acquired knowledge.
We propose a rehearsal-based continual diffusion model, called Continual diffuser (CoD), to endow the diffuser with the capabilities of quick adaptation (plasticity) and lasting retention (stability)
arXiv Detail & Related papers (2024-09-04T08:21:47Z) - Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer [10.338170161831496]
Decision Transformer (DT) has emerged as a promising class of algorithms in offline reinforcement learning (RL) tasks.
We introduce the Language model-d Prompt Transformer (LPDT), which leverages pre-trained language models for meta-RL tasks and fine-tunes the model using Low-rank Adaptation (LoRA)
Our approach integrates pre-trained language model and RL tasks seamlessly.
arXiv Detail & Related papers (2024-08-02T17:25:34Z) - Gradual Divergence for Seamless Adaptation: A Novel Domain Incremental Learning Method [19.751735234229972]
Domain incremental learning (DIL) poses a significant challenge in real-world scenarios.
Mitigating representation drift, which refers to the phenomenon of learned representations undergoing changes as the model adapts to new tasks, can help alleviate catastrophic forgetting.
We propose a novel DIL method named DARE, featuring a three-stage training process: Divergence, Adaptation, and REfinement.
arXiv Detail & Related papers (2024-06-23T22:05:52Z) - Generalization to New Sequential Decision Making Tasks with In-Context
Learning [23.36106067650874]
Training autonomous agents that can learn new tasks from only a handful of demonstrations is a long-standing problem in machine learning.
In this paper, we show that naively applying transformers to sequential decision making problems does not enable in-context learning of new tasks.
We investigate different design choices and find that larger model and dataset sizes, as well as more task diversity, environmentity, and trajectory burstiness, all result in better in-context learning of new out-of-distribution tasks.
arXiv Detail & Related papers (2023-12-06T15:19:28Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Learning to Modulate pre-trained Models in RL [22.812215561012874]
Fine-tuning a pre-trained model often suffers from catastrophic forgetting.
Our study shows that with most fine-tuning approaches, the performance on pre-training tasks deteriorates significantly.
We propose a novel method, Learning-to-Modulate (L2M), that avoids the degradation of learned skills by modulating the information flow of the frozen pre-trained model.
arXiv Detail & Related papers (2023-06-26T17:53:05Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Self-Supervised Reinforcement Learning that Transfers using Random
Features [41.00256493388967]
We propose a self-supervised reinforcement learning method that enables the transfer of behaviors across tasks with different rewards.
Our method is self-supervised in that it can be trained on offline datasets without reward labels, but can then be quickly deployed on new tasks.
arXiv Detail & Related papers (2023-05-26T20:37:06Z) - Prompting Decision Transformer for Few-Shot Policy Generalization [98.0914217850999]
We propose a Prompt-based Decision Transformer (Prompt-DT) to achieve few-shot adaptation in offline RL.
Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks.
arXiv Detail & Related papers (2022-06-27T17:59:17Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.