Related papers: Conservative State Value Estimation for Offline Reinforcement Learning

Conservative State Value Estimation for Offline Reinforcement Learning

URL: http://arxiv.org/abs/2302.06884v2
Date: Sat, 2 Dec 2023 14:08:25 GMT
Title: Conservative State Value Estimation for Offline Reinforcement Learning
Authors: Liting Chen, Jie Yan, Zhengdao Shao, Lu Wang, Qingwei Lin, Saravan Rajmohan, Thomas Moscibroda and Dongmei Zhang
Abstract summary: Conservative State Value Estimation (CSVE) learns conservative V-function via directly imposing penalty on OOD states. We develop a practical actor-critic algorithm in which the critic does the conservative value estimation by additionally sampling and penalizing the states empharound the dataset. We evaluate in classic continual control tasks of D4RL, showing that our method performs better than the conservative Q-function learning methods and is strongly competitive among recent SOTA methods.
Score: 36.416504941791224
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Offline reinforcement learning faces a significant challenge of value over-estimation due to the distributional drift between the dataset and the current learned policy, leading to learning failure in practice. The common approach is to incorporate a penalty term to reward or value estimation in the Bellman iterations. Meanwhile, to avoid extrapolation on out-of-distribution (OOD) states and actions, existing methods focus on conservative Q-function estimation. In this paper, we propose Conservative State Value Estimation (CSVE), a new approach that learns conservative V-function via directly imposing penalty on OOD states. Compared to prior work, CSVE allows more effective state value estimation with conservative guarantees and further better policy optimization. Further, we apply CSVE and develop a practical actor-critic algorithm in which the critic does the conservative value estimation by additionally sampling and penalizing the states \emph{around} the dataset, and the actor applies advantage weighted updates extended with state exploration to improve the policy. We evaluate in classic continual control tasks of D4RL, showing that our method performs better than the conservative Q-function learning methods and is strongly competitive among recent SOTA methods.

Related papers

Imagination-Limited Q-Learning for Offline Reinforcement Learning [18.8976065411658]
We propose an Imagination-Limited Q-learning (ILQ) method to balance exploitation and restriction.<n>Specifically, we utilize the dynamics model to imagine OOD action-values, and then clip the imagined values with the maximum behavior values.<n>Our method achieves state-of-the-art performance on a wide range of tasks in the D4RL benchmark.
arXiv Detail & Related papers (2025-05-18T03:05:21Z)
Strategically Conservative Q-Learning [89.17906766703763]
offline reinforcement learning (RL) is a compelling paradigm to extend RL's practical utility. The major difficulty in offline RL is mitigating the impact of approximation errors when encountering out-of-distribution (OOD) actions. We propose a novel framework called Strategically Conservative Q-Learning (SCQ) that distinguishes between OOD data that is easy and hard to estimate.
arXiv Detail & Related papers (2024-06-06T22:09:46Z)
Confidence-Conditioned Value Functions for Offline Reinforcement Learning [86.59173545987984]
We propose a new form of Bellman backup that simultaneously learns Q-values for any degree of confidence with high probability. We theoretically show that our learned value functions produce conservative estimates of the true value at any desired confidence.
arXiv Detail & Related papers (2022-12-08T23:56:47Z)
DCE: Offline Reinforcement Learning With Double Conservative Estimates [20.48354991493888]
We propose a simple conservative estimation method, double conservative estimates (DCE) Our algorithm introduces V-function to avoid the error of in-distribution action while implicit achieving conservative estimation. Our experiment separately shows that two conservative estimation methods impact the estimation of all state-action.
arXiv Detail & Related papers (2022-09-27T03:34:19Z)
Offline Reinforcement Learning with Implicit Q-Learning [85.62618088890787]
Current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy. We propose an offline RL method that never needs to evaluate actions outside of the dataset. This method enables the learned policy to improve substantially over the best behavior in the data through generalization.
arXiv Detail & Related papers (2021-10-12T17:05:05Z)
Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation. We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z)
Reducing Conservativeness Oriented Offline Reinforcement Learning [29.895142928565228]
In offline reinforcement learning, a policy learns to maximize cumulative rewards with a fixed collection of data. We propose the method of reducing conservativeness oriented reinforcement learning. Our proposed method is able to tackle the skewed distribution of the provided dataset and derive a value function closer to the expected value function.
arXiv Detail & Related papers (2021-02-27T01:21:01Z)
Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation. Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.