Exclusively Penalized Q-learning for Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2405.14082v2
- Date: Thu, 24 Oct 2024 07:56:23 GMT
- Title: Exclusively Penalized Q-learning for Offline Reinforcement Learning
- Authors: Junghyuk Yeom, Yonghyeon Jo, Jungmo Kim, Sanghyeon Lee, Seungyul Han,
- Abstract summary: Constraint-based offline reinforcement learning (RL) involves policy constraints or imposing penalties on the value function to mitigate overestimation errors caused by distributional shift.
This paper focuses on a limitation in existing offline RL methods with penalized value function, indicating the potential for underestimation bias due to unnecessary bias introduced in the value function.
We propose Exclusively Penalized Q-learning (EPQ), which reduces estimation bias in the value function by selectively penalizing states that are prone to inducing estimation errors.
- Score: 4.916646834691489
- License:
- Abstract: Constraint-based offline reinforcement learning (RL) involves policy constraints or imposing penalties on the value function to mitigate overestimation errors caused by distributional shift. This paper focuses on a limitation in existing offline RL methods with penalized value function, indicating the potential for underestimation bias due to unnecessary bias introduced in the value function. To address this concern, we propose Exclusively Penalized Q-learning (EPQ), which reduces estimation bias in the value function by selectively penalizing states that are prone to inducing estimation errors. Numerical results show that our method significantly reduces underestimation bias and improves performance in various offline control tasks compared to other offline RL methods
Related papers
- Strategically Conservative Q-Learning [89.17906766703763]
offline reinforcement learning (RL) is a compelling paradigm to extend RL's practical utility.
The major difficulty in offline RL is mitigating the impact of approximation errors when encountering out-of-distribution (OOD) actions.
We propose a novel framework called Strategically Conservative Q-Learning (SCQ) that distinguishes between OOD data that is easy and hard to estimate.
arXiv Detail & Related papers (2024-06-06T22:09:46Z) - A Perspective of Q-value Estimation on Offline-to-Online Reinforcement
Learning [54.48409201256968]
offline-to-online Reinforcement Learning (O2O RL) aims to improve the performance of offline pretrained policy using only a few online samples.
Most O2O methods focus on the balance between RL objective and pessimism, or the utilization of offline and online samples.
arXiv Detail & Related papers (2023-12-12T19:24:35Z) - Robust Offline Reinforcement Learning with Gradient Penalty and
Constraint Relaxation [38.95482624075353]
We introduce gradient penalty over the learned value function to tackle the exploding Q-functions.
We then relax the closeness constraints towards non-optimal actions with critic weighted constraint relaxation.
Experimental results show that the proposed techniques effectively tame the non-optimal trajectories for policy constraint offline RL methods.
arXiv Detail & Related papers (2022-10-19T11:22:36Z) - Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement
Learning [125.8224674893018]
Offline Reinforcement Learning (RL) aims to learn policies from previously collected datasets without exploring the environment.
Applying off-policy algorithms to offline RL usually fails due to the extrapolation error caused by the out-of-distribution (OOD) actions.
We propose Pessimistic Bootstrapping for offline RL (PBRL), a purely uncertainty-driven offline algorithm without explicit policy constraints.
arXiv Detail & Related papers (2022-02-23T15:27:16Z) - BRAC+: Improved Behavior Regularized Actor Critic for Offline
Reinforcement Learning [14.432131909590824]
Offline Reinforcement Learning aims to train effective policies using previously collected datasets.
Standard off-policy RL algorithms are prone to overestimations of the values of out-of-distribution (less explored) actions.
We improve the behavior regularized offline reinforcement learning and propose BRAC+.
arXiv Detail & Related papers (2021-10-02T23:55:49Z) - Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.
Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states.
We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z) - Instabilities of Offline RL with Pre-Trained Neural Representation [127.89397629569808]
In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn) policies in scenarios where the data are collected from a distribution that substantially differs from that of the target policy to be evaluated.
Recent theoretical advances have shown that such sample-efficient offline RL is indeed possible provided certain strong representational conditions hold.
This work studies these issues from an empirical perspective to gauge how stable offline RL methods are.
arXiv Detail & Related papers (2021-03-08T18:06:44Z) - Continuous Doubly Constrained Batch Reinforcement Learning [93.23842221189658]
We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment.
The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data.
We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates.
arXiv Detail & Related papers (2021-02-18T08:54:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.