Towards Robust Offline Reinforcement Learning under Diverse Data
Corruption
- URL: http://arxiv.org/abs/2310.12955v3
- Date: Sat, 9 Mar 2024 17:46:44 GMT
- Title: Towards Robust Offline Reinforcement Learning under Diverse Data
Corruption
- Authors: Rui Yang, Han Zhong, Jiawei Xu, Amy Zhang, Chongjie Zhang, Lei Han,
Tong Zhang
- Abstract summary: We show that implicit Q-learning (IQL) demonstrates remarkable resilience to data corruption among various offline RL algorithms.
We propose a more robust offline RL approach named Robust IQL (RIQL)
- Score: 46.16052026620402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning (RL) presents a promising approach for
learning reinforced policies from offline datasets without the need for costly
or unsafe interactions with the environment. However, datasets collected by
humans in real-world environments are often noisy and may even be maliciously
corrupted, which can significantly degrade the performance of offline RL. In
this work, we first investigate the performance of current offline RL
algorithms under comprehensive data corruption, including states, actions,
rewards, and dynamics. Our extensive experiments reveal that implicit
Q-learning (IQL) demonstrates remarkable resilience to data corruption among
various offline RL algorithms. Furthermore, we conduct both empirical and
theoretical analyses to understand IQL's robust performance, identifying its
supervised policy learning scheme as the key factor. Despite its relative
robustness, IQL still suffers from heavy-tail targets of Q functions under
dynamics corruption. To tackle this challenge, we draw inspiration from robust
statistics to employ the Huber loss to handle the heavy-tailedness and utilize
quantile estimators to balance penalization for corrupted data and learning
stability. By incorporating these simple yet effective modifications into IQL,
we propose a more robust offline RL approach named Robust IQL (RIQL). Extensive
experiments demonstrate that RIQL exhibits highly robust performance when
subjected to diverse data corruption scenarios.
Related papers
- Towards Sample-Efficient and Stable Reinforcement Learning for LLM-based Recommendation [56.92367609590823]
Long Chain-of-Thought (Long CoT) reasoning has shown promise in Large Language Models (LLMs)<n>We argue that Long CoT is inherently ill-suited for the sequential recommendation domain.<n>We propose RISER, a novel Reinforced Item Space Exploration framework for Recommendation.
arXiv Detail & Related papers (2026-01-31T10:02:43Z) - Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling [35.2859997591196]
offline reinforcement learning holds promise for scaling data-driven decision-making.
However, real-world data collected from sensors or humans often contains noise and errors.
Our study reveals that prior research falls short under data corruption when the dataset is limited.
arXiv Detail & Related papers (2024-07-05T06:34:32Z) - Equivariant Offline Reinforcement Learning [7.822389399560674]
We investigate the use of $SO(2)$-equivariant neural networks for offline RL with a limited number of demonstrations.
Our experimental results show that equivariant versions of Conservative Q-Learning (CQL) and Implicit Q-Learning (IQL) outperform their non-equivariant counterparts.
arXiv Detail & Related papers (2024-06-20T03:02:49Z) - Simple Ingredients for Offline Reinforcement Learning [86.1988266277766]
offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.
We show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline buffer.
We show that scale, more than algorithmic considerations, is the key factor influencing performance.
arXiv Detail & Related papers (2024-03-19T18:57:53Z) - Offline Reinforcement Learning with Imbalanced Datasets [23.454333727200623]
A real-world offline reinforcement learning (RL) dataset is often imbalanced over the state space due to the challenge of exploration or safety considerations.
We show that typically offline RL methods based on distributional constraints, such as conservative Q-learning (CQL), are ineffective in extracting policies under the imbalanced dataset.
Inspired by natural intelligence, we propose a novel offline RL method that utilizes the augmentation of CQL with a retrieval process to recall past related experiences.
arXiv Detail & Related papers (2023-07-06T03:22:19Z) - FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations [52.85536740465277]
FIRE is a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment.
We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function.
We show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
arXiv Detail & Related papers (2022-09-28T19:49:39Z) - Robust Reinforcement Learning using Offline Data [23.260211453437055]
We propose a robust reinforcement learning algorithm called Robust Fitted Q-Iteration (RFQI)
RFQI uses only an offline dataset to learn the optimal robust policy.
We prove that RFQI learns a near-optimal robust policy under standard assumptions.
arXiv Detail & Related papers (2022-08-10T03:47:45Z) - Constraints Penalized Q-Learning for Safe Offline Reinforcement Learning [15.841609263723575]
We study the problem of safe offline reinforcement learning (RL)
The goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment.
We show that na"ive approaches that combine techniques from safe RL and offline RL can only learn sub-optimal solutions.
arXiv Detail & Related papers (2021-07-19T16:30:14Z) - Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.
Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states.
We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z) - Conservative Q-Learning for Offline Reinforcement Learning [106.05582605650932]
We show that CQL substantially outperforms existing offline RL methods, often learning policies that attain 2-5 times higher final return.
We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.
arXiv Detail & Related papers (2020-06-08T17:53:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.