Related papers: Solving Continual Offline Reinforcement Learning with Decision Transformer

Solving Continual Offline Reinforcement Learning with Decision Transformer

URL: http://arxiv.org/abs/2401.08478v2
Date: Sun, 7 Apr 2024 11:29:37 GMT
Title: Solving Continual Offline Reinforcement Learning with Decision Transformer
Authors: Kaixin Huang, Li Shen, Chen Zhao, Chun Yuan, Dacheng Tao,
Abstract summary: Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning. Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and weak knowledge-sharing. We introduce multi-head DT (MH-DT) and low-rank adaptation DT (LoRA-DT) to mitigate DT's forgetting problem.
Score: 78.59473797783673
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning, enabling agents to learn multiple tasks from static datasets without forgetting prior tasks. However, CORL faces challenges in balancing stability and plasticity. Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and weak knowledge-sharing. We aim to investigate whether Decision Transformer (DT), another offline RL paradigm, can serve as a more suitable offline continuous learner to address these issues. We first compare AC-based offline algorithms with DT in the CORL framework. DT offers advantages in learning efficiency, distribution shift mitigation, and zero-shot generalization but exacerbates the forgetting problem during supervised parameter updates. We introduce multi-head DT (MH-DT) and low-rank adaptation DT (LoRA-DT) to mitigate DT's forgetting problem. MH-DT stores task-specific knowledge using multiple heads, facilitating knowledge sharing with common components. It employs distillation and selective rehearsal to enhance current task learning when a replay buffer is available. In buffer-unavailable scenarios, LoRA-DT merges less influential weights and fine-tunes DT's decisive MLP layer to adapt to the current task. Extensive experiments on MoJuCo and Meta-World benchmarks demonstrate that our methods outperform SOTA CORL baselines and showcase enhanced learning capabilities and superior memory efficiency.

Related papers

Enhancing Pre-Trained Decision Transformers with Prompt-Tuning Bandits [2.6731152954002924]
We introduce a scalable bandit-based prompt-tuning method that learns to construct high-performance trajectory prompts. Our approach significantly enhances downstream task performance without modifying the pre-trained Transformer backbone.
arXiv Detail & Related papers (2025-02-07T14:57:17Z)
N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs [42.446740732573296]
In-context learning allows models like transformers to adapt to new tasks without updating their weights. Existing in-context RL methods, such as Algorithm Distillation (AD), demand large, carefully curated datasets. In this work we integrated the n-gram induction heads into transformers for in-context RL.
arXiv Detail & Related papers (2024-11-04T10:31:03Z)
Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal [54.93261535899478]
In real-world applications, such as robotic control of reinforcement learning, the tasks are changing, and new tasks arise in a sequential order. This situation poses the new challenge of plasticity-stability trade-off for training an agent who can adapt to task changes and retain acquired knowledge. We propose a rehearsal-based continual diffusion model, called Continual diffuser (CoD), to endow the diffuser with the capabilities of quick adaptation (plasticity) and lasting retention (stability)
arXiv Detail & Related papers (2024-09-04T08:21:47Z)
Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer [10.338170161831496]
Decision Transformer (DT) has emerged as a promising class of algorithms in offline reinforcement learning (RL) tasks. We introduce the Language model-d Prompt Transformer (LPDT), which leverages pre-trained language models for meta-RL tasks and fine-tunes the model using Low-rank Adaptation (LoRA) Our approach integrates pre-trained language model and RL tasks seamlessly.
arXiv Detail & Related papers (2024-08-02T17:25:34Z)
Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning [22.13331870720021]
We propose a beyond prompt learning approach to the RFCL task, called Continual Adapter (C-ADA) C-ADA flexibly extends specific weights in CAL to learn new knowledge for each task and freezes old weights to preserve prior knowledge. Our approach achieves significantly improved performance and training speed, outperforming the current state-of-the-art (SOTA) method.
arXiv Detail & Related papers (2024-07-14T17:40:40Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling [34.547551367941246]
Real-world data collected from sensors or humans often contains noise and errors. Traditional offline RL methods based on temporal difference learning tend to underperform Decision Transformer (DT) under data corruption. We propose Robust Decision Transformer (RDT) by incorporating several robust techniques.
arXiv Detail & Related papers (2024-07-05T06:34:32Z)
In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought [13.034968416139826]
We propose an In-context Decision Transformer (IDT) to achieve self-improvement in a high-level trial-and-error manner. IDT is inspired by the efficient hierarchical structure of human decision-making. IDT achieves state-of-the-art in long-horizon tasks over current in-context RL methods.
arXiv Detail & Related papers (2024-05-31T08:38:25Z)
Offline Reinforcement Learning from Datasets with Structured Non-Stationarity [50.35634234137108]
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy. We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode. We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation.
arXiv Detail & Related papers (2024-05-23T02:41:36Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Generalized Decision Transformer for Offline Hindsight Information Matching [16.7594941269479]
We present Generalized Decision Transformer (GDT) for solving any hindsight information matching (HIM) problem. We show how different choices for the feature function and the anti-causal aggregator lead to novel Categorical DT (CDT) and Bi-directional DT (BDT) for matching different statistics of the future.
arXiv Detail & Related papers (2021-11-19T18:56:13Z)
Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp [119.69304125647785]
This paper introduces a concise yet powerful method to construct Continuous Transition. Specifically, we propose to synthesize new transitions for training by linearly interpolating the consecutive transitions. To keep the constructed transitions authentic, we also develop a discriminator to guide the construction process automatically.
arXiv Detail & Related papers (2020-11-30T01:20:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.