Contextual Conservative Q-Learning for Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2301.01298v2
- Date: Thu, 5 Jan 2023 17:54:24 GMT
- Title: Contextual Conservative Q-Learning for Offline Reinforcement Learning
- Authors: Ke Jiang, Jiayu Yao, Xiaoyang Tan
- Abstract summary: We propose Contextual Conservative Q-Learning(C-CQL) to learn a robustly reliable policy through the contextual information captured via an inverse dynamics model.
C-CQL achieves the state-of-the-art performance in most environments of offline Mujoco suite and a noisy Mujoco setting.
- Score: 15.819356579361843
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning learns an effective policy on offline datasets
without online interaction, and it attracts persistent research attention due
to its potential of practical application. However, extrapolation error
generated by distribution shift will still lead to the overestimation for those
actions that transit to out-of-distribution(OOD) states, which degrades the
reliability and robustness of the offline policy. In this paper, we propose
Contextual Conservative Q-Learning(C-CQL) to learn a robustly reliable policy
through the contextual information captured via an inverse dynamics model. With
the supervision of the inverse dynamics model, it tends to learn a policy that
generates stable transition at perturbed states, for the fact that pertuebed
states are a common kind of OOD states. In this manner, we enable the learnt
policy more likely to generate transition that destines to the empirical next
state distributions of the offline dataset, i.e., robustly reliable transition.
Besides, we theoretically reveal that C-CQL is the generalization of the
Conservative Q-Learning(CQL) and aggressive State Deviation Correction(SDC).
Finally, experimental results demonstrate the proposed C-CQL achieves the
state-of-the-art performance in most environments of offline Mujoco suite and a
noisy Mujoco setting.
Related papers
- Constrained Latent Action Policies for Model-Based Offline Reinforcement Learning [5.012314384895537]
In offline reinforcement learning, a policy is learned using a static dataset in the absence of costly feedback from the environment.
We propose Constrained Latent Action Policies (C-LAP) which learns a generative model of the joint distribution of observations and actions.
arXiv Detail & Related papers (2024-11-07T09:35:22Z) - Strategically Conservative Q-Learning [89.17906766703763]
offline reinforcement learning (RL) is a compelling paradigm to extend RL's practical utility.
The major difficulty in offline RL is mitigating the impact of approximation errors when encountering out-of-distribution (OOD) actions.
We propose a novel framework called Strategically Conservative Q-Learning (SCQ) that distinguishes between OOD data that is easy and hard to estimate.
arXiv Detail & Related papers (2024-06-06T22:09:46Z) - Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline
Reinforcement Learning [57.83919813698673]
Projected Off-Policy Q-Learning (POP-QL) is a novel actor-critic algorithm that simultaneously reweights off-policy samples and constrains the policy to prevent divergence and reduce value-approximation error.
In our experiments, POP-QL not only shows competitive performance on standard benchmarks, but also out-performs competing methods in tasks where the data-collection policy is significantly sub-optimal.
arXiv Detail & Related papers (2023-11-25T00:30:58Z) - Offline RL With Realistic Datasets: Heteroskedasticity and Support
Constraints [82.43359506154117]
We show that typical offline reinforcement learning methods fail to learn from data with non-uniform variability.
Our method is simple, theoretically motivated, and improves performance across a wide range of offline RL problems in Atari games, navigation, and pixel-based manipulation.
arXiv Detail & Related papers (2022-11-02T11:36:06Z) - Mildly Conservative Q-Learning for Offline Reinforcement Learning [63.2183622958666]
offline reinforcement learning (RL) defines the task of learning from a static logged dataset without continually interacting with the environment.
Existing approaches, penalizing the unseen actions or regularizing with the behavior policy, are too pessimistic.
We propose Mildly Conservative Q-learning (MCQ), where OOD actions are actively trained by assigning them proper pseudo Q values.
arXiv Detail & Related papers (2022-06-09T19:44:35Z) - Offline Reinforcement Learning with Implicit Q-Learning [85.62618088890787]
Current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy.
We propose an offline RL method that never needs to evaluate actions outside of the dataset.
This method enables the learned policy to improve substantially over the best behavior in the data through generalization.
arXiv Detail & Related papers (2021-10-12T17:05:05Z) - Conservative Q-Learning for Offline Reinforcement Learning [106.05582605650932]
We show that CQL substantially outperforms existing offline RL methods, often learning policies that attain 2-5 times higher final return.
We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.
arXiv Detail & Related papers (2020-06-08T17:53:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.