Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling
- URL: http://arxiv.org/abs/2407.04285v3
- Date: Thu, 13 Feb 2025 03:51:06 GMT
- Title: Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling
- Authors: Jiawei Xu, Rui Yang, Shuang Qiu, Feng Luo, Meng Fang, Baoxiang Wang, Lei Han,
- Abstract summary: offline reinforcement learning holds promise for scaling data-driven decision-making.
However, real-world data collected from sensors or humans often contains noise and errors.
Our study reveals that prior research falls short under data corruption when the dataset is limited.
- Score: 35.2859997591196
- License:
- Abstract: Learning policy from offline datasets through offline reinforcement learning (RL) holds promise for scaling data-driven decision-making while avoiding unsafe and costly online interactions. However, real-world data collected from sensors or humans often contains noise and errors, posing a significant challenge for existing offline RL methods, particularly when the real-world data is limited. Our study reveals that prior research focusing on adapting predominant offline RL methods based on temporal difference learning still falls short under data corruption when the dataset is limited. In contrast, we discover that vanilla sequence modeling methods, such as Decision Transformer, exhibit robustness against data corruption, even without specialized modifications. To unlock the full potential of sequence modeling, we propose Robust Decision Rransformer (RDT) by incorporating three simple yet effective robust techniques: embedding dropout to improve the model's robustness against erroneous inputs, Gaussian weighted learning to mitigate the effects of corrupted labels, and iterative data correction to eliminate corrupted data from the source. Extensive experiments on MuJoCo, Kitchen, and Adroit tasks demonstrate RDT's superior performance under various data corruption scenarios compared to prior methods. Furthermore, RDT exhibits remarkable robustness in a more challenging setting that combines training-time data corruption with test-time observation perturbations. These results highlight the potential of sequence modeling for learning from noisy or corrupted offline datasets, thereby promoting the reliable application of offline RL in real-world scenarios. Our code is available at https://github.com/jiawei415/RobustDecisionTransformer.
Related papers
- Diverse Transformer Decoding for Offline Reinforcement Learning Using Financial Algorithmic Approaches [4.364595470673757]
Portfolio Beam Search (PBS) is a simple-yet-effective alternative to Beam Search (BS)
We develop an uncertainty-aware diversification mechanism, which we integrate into a sequential decoding algorithm at inference time.
We empirically demonstrate the effectiveness of PBS on the D4RL benchmark, where it achieves higher returns and significantly reduces outcome variability.
arXiv Detail & Related papers (2025-02-13T15:51:46Z) - What Really Matters for Learning-based LiDAR-Camera Calibration [50.2608502974106]
This paper revisits the development of learning-based LiDAR-Camera calibration.
We identify the critical limitations of regression-based methods with the widely used data generation pipeline.
We also investigate how the input data format and preprocessing operations impact network performance.
arXiv Detail & Related papers (2025-01-28T14:12:32Z) - Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data Corruptions [8.666879925570331]
Real-world offline datasets are often subject to data corruptions due to sensor failures or malicious attacks.
Existing methods struggle to learn robust agents under high uncertainty caused by corrupted data.
We propose a novel robust variational Bayesian inference for offline RL (TRACER)
arXiv Detail & Related papers (2024-11-01T09:28:24Z) - D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - SRTFD: Scalable Real-Time Fault Diagnosis through Online Continual Learning [8.016378373626084]
Modern industrial environments demand FD methods that can handle new fault types, dynamic conditions, large-scale data, and provide real-time responses with minimal prior information.
We propose SRTFD, a scalable real-time fault diagnosis framework that enhances online continual learning (OCL) with three critical methods.
Experiments on a real-world dataset and two public simulated datasets demonstrate SRTFD's effectiveness and potential for providing advanced, scalable, and precise fault diagnosis in modern industrial systems.
arXiv Detail & Related papers (2024-08-11T03:26:22Z) - Causal Deep Reinforcement Learning Using Observational Data [11.790171301328158]
We propose two deconfounding methods in deep reinforcement learning (DRL)
The methods first calculate the importance degree of different samples based on the causal inference technique, and then adjust the impact of different samples on the loss function.
We prove the effectiveness of our deconfounding methods and validate them experimentally.
arXiv Detail & Related papers (2022-11-28T14:34:39Z) - Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem.
Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z) - Robust Offline Reinforcement Learning with Gradient Penalty and
Constraint Relaxation [38.95482624075353]
We introduce gradient penalty over the learned value function to tackle the exploding Q-functions.
We then relax the closeness constraints towards non-optimal actions with critic weighted constraint relaxation.
Experimental results show that the proposed techniques effectively tame the non-optimal trajectories for policy constraint offline RL methods.
arXiv Detail & Related papers (2022-10-19T11:22:36Z) - Online Coreset Selection for Rehearsal-based Continual Learning [65.85595842458882]
In continual learning, we store a subset of training examples (coreset) to be replayed later to alleviate catastrophic forgetting.
We propose Online Coreset Selection (OCS), a simple yet effective method that selects the most representative and informative coreset at each iteration.
Our proposed method maximizes the model's adaptation to a target dataset while selecting high-affinity samples to past tasks, which directly inhibits catastrophic forgetting.
arXiv Detail & Related papers (2021-06-02T11:39:25Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.