Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling
- URL: http://arxiv.org/abs/2407.04285v1
- Date: Fri, 5 Jul 2024 06:34:32 GMT
- Title: Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling
- Authors: Jiawei Xu, Rui Yang, Feng Luo, Meng Fang, Baoxiang Wang, Lei Han,
- Abstract summary: Real-world data collected from sensors or humans often contains noise and errors.
Traditional offline RL methods based on temporal difference learning tend to underperform Decision Transformer (DT) under data corruption.
We propose Robust Decision Transformer (RDT) by incorporating several robust techniques.
- Score: 34.547551367941246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning policies from offline datasets through offline reinforcement learning (RL) holds promise for scaling data-driven decision-making and avoiding unsafe and costly online interactions. However, real-world data collected from sensors or humans often contains noise and errors, posing a significant challenge for existing offline RL methods. Our study indicates that traditional offline RL methods based on temporal difference learning tend to underperform Decision Transformer (DT) under data corruption, especially when the amount of data is limited. This suggests the potential of sequential modeling for tackling data corruption in offline RL. To further unleash the potential of sequence modeling methods, we propose Robust Decision Transformer (RDT) by incorporating several robust techniques. Specifically, we introduce Gaussian weighted learning and iterative data correction to reduce the effect of corrupted data. Additionally, we leverage embedding dropout to enhance the model's resistance to erroneous inputs. Extensive experiments on MoJoCo, KitChen, and Adroit tasks demonstrate RDT's superior performance under diverse data corruption compared to previous methods. Moreover, RDT exhibits remarkable robustness in a challenging setting that combines training-time data corruption with testing-time observation perturbations. These results highlight the potential of robust sequence modeling for learning from noisy or corrupted offline datasets, thereby promoting the reliable application of offline RL in real-world tasks.
Related papers
- D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - SRTFD: Scalable Real-Time Fault Diagnosis through Online Continual Learning [8.016378373626084]
Modern industrial environments demand FD methods that can handle new fault types, dynamic conditions, large-scale data, and provide real-time responses with minimal prior information.
We propose SRTFD, a scalable real-time fault diagnosis framework that enhances online continual learning (OCL) with three critical methods.
Experiments on a real-world dataset and two public simulated datasets demonstrate SRTFD's effectiveness and potential for providing advanced, scalable, and precise fault diagnosis in modern industrial systems.
arXiv Detail & Related papers (2024-08-11T03:26:22Z) - Out-of-Distribution Adaptation in Offline RL: Counterfactual Reasoning via Causal Normalizing Flows [30.926243761581624]
Causal Normalizing Flow (CNF) is developed to learn the transition and reward functions for data generation and augmentation in offline policy evaluation and training.
CNF gains predictive and counterfactual reasoning capabilities for sequential decision-making tasks, revealing a high potential for OOD adaptation.
Our CNF-based offline RL approach is validated through empirical evaluations, outperforming model-free and model-based methods by a significant margin.
arXiv Detail & Related papers (2024-05-06T22:44:32Z) - Solving Continual Offline Reinforcement Learning with Decision Transformer [78.59473797783673]
Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning.
Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and weak knowledge-sharing.
We introduce multi-head DT (MH-DT) and low-rank adaptation DT (LoRA-DT) to mitigate DT's forgetting problem.
arXiv Detail & Related papers (2024-01-16T16:28:32Z) - Reasoning with Latent Diffusion in Offline Reinforcement Learning [11.349356866928547]
offline reinforcement learning holds promise as a means to learn high-reward policies from a static dataset.
Key challenge in offline RL lies in effectively stitching portions of suboptimal trajectories from the static dataset.
We propose a novel approach that leverages the expressiveness of latent diffusion to model in-support trajectory sequences as compressed latent skills.
arXiv Detail & Related papers (2023-09-12T20:58:21Z) - Causal Deep Reinforcement Learning Using Observational Data [11.790171301328158]
We propose two deconfounding methods in deep reinforcement learning (DRL)
The methods first calculate the importance degree of different samples based on the causal inference technique, and then adjust the impact of different samples on the loss function.
We prove the effectiveness of our deconfounding methods and validate them experimentally.
arXiv Detail & Related papers (2022-11-28T14:34:39Z) - Adaptive Behavior Cloning Regularization for Stable Offline-to-Online
Reinforcement Learning [80.25648265273155]
Offline reinforcement learning, by learning from a fixed dataset, makes it possible to learn agent behaviors without interacting with the environment.
During online fine-tuning, the performance of the pre-trained agent may collapse quickly due to the sudden distribution shift from offline to online data.
We propose to adaptively weigh the behavior cloning loss during online fine-tuning based on the agent's performance and training stability.
Experiments show that the proposed method yields state-of-the-art offline-to-online reinforcement learning performance on the popular D4RL benchmark.
arXiv Detail & Related papers (2022-10-25T09:08:26Z) - Robust Offline Reinforcement Learning with Gradient Penalty and
Constraint Relaxation [38.95482624075353]
We introduce gradient penalty over the learned value function to tackle the exploding Q-functions.
We then relax the closeness constraints towards non-optimal actions with critic weighted constraint relaxation.
Experimental results show that the proposed techniques effectively tame the non-optimal trajectories for policy constraint offline RL methods.
arXiv Detail & Related papers (2022-10-19T11:22:36Z) - Don't Change the Algorithm, Change the Data: Exploratory Data for
Offline Reinforcement Learning [147.61075994259807]
We propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL.
ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL.
We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks.
arXiv Detail & Related papers (2022-01-31T18:39:27Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.