Compositional Conservatism: A Transductive Approach in Offline Reinforcement Learning
- URL: http://arxiv.org/abs/2404.04682v1
- Date: Sat, 6 Apr 2024 17:02:18 GMT
- Title: Compositional Conservatism: A Transductive Approach in Offline Reinforcement Learning
- Authors: Yeda Song, Dongwook Lee, Gunhee Kim,
- Abstract summary: We propose COmpositional COnservatism with Anchor-seeking (COCOA) for offline reinforcement learning.
We apply COCOA to four state-of-the-art offline RL algorithms and evaluate them on the D4RL benchmark.
- Score: 38.48360240082561
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning (RL) is a compelling framework for learning optimal policies from past experiences without additional interaction with the environment. Nevertheless, offline RL inevitably faces the problem of distributional shifts, where the states and actions encountered during policy execution may not be in the training dataset distribution. A common solution involves incorporating conservatism into the policy or the value function to safeguard against uncertainties and unknowns. In this work, we focus on achieving the same objectives of conservatism but from a different perspective. We propose COmpositional COnservatism with Anchor-seeking (COCOA) for offline RL, an approach that pursues conservatism in a compositional manner on top of the transductive reparameterization (Netanyahu et al., 2023), which decomposes the input variable (the state in our case) into an anchor and its difference from the original input. Our COCOA seeks both in-distribution anchors and differences by utilizing the learned reverse dynamics model, encouraging conservatism in the compositional input space for the policy or value function. Such compositional conservatism is independent of and agnostic to the prevalent behavioral conservatism in offline RL. We apply COCOA to four state-of-the-art offline RL algorithms and evaluate them on the D4RL benchmark, where COCOA generally improves the performance of each algorithm. The code is available at https://github.com/runamu/compositional-conservatism.
Related papers
- CROP: Conservative Reward for Model-based Offline Policy Optimization [15.121328040092264]
This paper proposes a novel model-based offline RL algorithm, Conservative Reward for model-based Offline Policy optimization (CROP)
To achieve a conservative reward estimation, CROP simultaneously minimizes the estimation error and the reward of random actions.
Notably, CROP establishes an innovative connection between offline and online RL, highlighting that offline RL problems can be tackled by adopting online RL techniques.
arXiv Detail & Related papers (2023-10-26T08:45:23Z) - Confidence-Conditioned Value Functions for Offline Reinforcement
Learning [86.59173545987984]
We propose a new form of Bellman backup that simultaneously learns Q-values for any degree of confidence with high probability.
We theoretically show that our learned value functions produce conservative estimates of the true value at any desired confidence.
arXiv Detail & Related papers (2022-12-08T23:56:47Z) - Offline RL With Realistic Datasets: Heteroskedasticity and Support
Constraints [82.43359506154117]
We show that typical offline reinforcement learning methods fail to learn from data with non-uniform variability.
Our method is simple, theoretically motivated, and improves performance across a wide range of offline RL problems in Atari games, navigation, and pixel-based manipulation.
arXiv Detail & Related papers (2022-11-02T11:36:06Z) - Offline RL Policies Should be Trained to be Adaptive [89.8580376798065]
We show that acting optimally in offline RL in a Bayesian sense involves solving an implicit POMDP.
As a result, optimal policies for offline RL must be adaptive, depending not just on the current state but rather all the transitions seen so far during evaluation.
We present a model-free algorithm for approximating this optimal adaptive policy, and demonstrate the efficacy of learning such adaptive policies in offline RL benchmarks.
arXiv Detail & Related papers (2022-07-05T17:58:33Z) - RORL: Robust Offline Reinforcement Learning via Conservative Smoothing [72.8062448549897]
offline reinforcement learning can exploit the massive amount of offline data for complex decision-making tasks.
Current offline RL algorithms are generally designed to be conservative for value estimation and action selection.
We propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique.
arXiv Detail & Related papers (2022-06-06T18:07:41Z) - Curriculum Offline Imitation Learning [72.1015201041391]
offline reinforcement learning tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment.
We propose textitCurriculum Offline Learning (COIL), which utilizes an experience picking strategy for imitating from adaptive neighboring policies with a higher return.
On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2021-11-03T08:02:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.