Offline Reinforcement Learning with Reverse Model-based Imagination
- URL: http://arxiv.org/abs/2110.00188v1
- Date: Fri, 1 Oct 2021 03:13:22 GMT
- Title: Offline Reinforcement Learning with Reverse Model-based Imagination
- Authors: Jianhao Wang, Wenzhe Li, Haozhe Jiang, Guangxiang Zhu, Siyuan Li,
Chongjie Zhang
- Abstract summary: In offline reinforcement learning (offline RL), one of the main challenges is to deal with the distributional shift between the learning policy and the given dataset.
Recent offline RL methods attempt to introduce conservatism bias to encourage learning on high-confidence areas.
We propose a novel model-based offline RL framework, called Reverse Offline Model-based Imagination (ROMI)
- Score: 25.376888160137973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In offline reinforcement learning (offline RL), one of the main challenges is
to deal with the distributional shift between the learning policy and the given
dataset. To address this problem, recent offline RL methods attempt to
introduce conservatism bias to encourage learning on high-confidence areas.
Model-free approaches directly encode such bias into policy or value function
learning using conservative regularizations or special network structures, but
their constrained policy search limits the generalization beyond the offline
dataset. Model-based approaches learn forward dynamics models with conservatism
quantifications and then generate imaginary trajectories to extend the offline
datasets. However, due to limited samples in offline dataset, conservatism
quantifications often suffer from overgeneralization in out-of-support regions.
The unreliable conservative measures will mislead forward model-based
imaginations to undesired areas, leading to overaggressive behaviors. To
encourage more conservatism, we propose a novel model-based offline RL
framework, called Reverse Offline Model-based Imagination (ROMI). We learn a
reverse dynamics model in conjunction with a novel reverse policy, which can
generate rollouts leading to the target goal states within the offline dataset.
These reverse imaginations provide informed data augmentation for the
model-free policy learning and enable conservative generalization beyond the
offline dataset. ROMI can effectively combine with off-the-shelf model-free
algorithms to enable model-based generalization with proper conservatism.
Empirical results show that our method can generate more conservative behaviors
and achieve state-of-the-art performance on offline RL benchmark tasks.
Related papers
- SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets [32.496818080222646]
We propose a new approach to model-based offline reinforcement learning.
We provide a theoretical guarantee of model uncertainty and performance bound of SeMOPO.
Experimental results show that our method substantially outperforms all baseline methods.
arXiv Detail & Related papers (2024-06-13T15:16:38Z) - Out-of-Distribution Adaptation in Offline RL: Counterfactual Reasoning via Causal Normalizing Flows [30.926243761581624]
Causal Normalizing Flow (CNF) is developed to learn the transition and reward functions for data generation and augmentation in offline policy evaluation and training.
CNF gains predictive and counterfactual reasoning capabilities for sequential decision-making tasks, revealing a high potential for OOD adaptation.
Our CNF-based offline RL approach is validated through empirical evaluations, outperforming model-free and model-based methods by a significant margin.
arXiv Detail & Related papers (2024-05-06T22:44:32Z) - Let Offline RL Flow: Training Conservative Agents in the Latent Space of
Normalizing Flows [58.762959061522736]
offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions.
We build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model.
We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms.
arXiv Detail & Related papers (2022-11-20T21:57:10Z) - A Unified Framework for Alternating Offline Model Training and Policy
Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning.
We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return.
With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z) - Double Check Your State Before Trusting It: Confidence-Aware
Bidirectional Offline Model-Based Imagination [31.805991958408438]
We propose to augment the offline dataset by using trained bidirectional dynamics models and rollout policies with double check.
Our method, confidence-aware bidirectional offline model-based imagination, generates reliable samples and can be combined with any model-free offline RL method.
arXiv Detail & Related papers (2022-06-16T08:00:44Z) - RORL: Robust Offline Reinforcement Learning via Conservative Smoothing [72.8062448549897]
offline reinforcement learning can exploit the massive amount of offline data for complex decision-making tasks.
Current offline RL algorithms are generally designed to be conservative for value estimation and action selection.
We propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique.
arXiv Detail & Related papers (2022-06-06T18:07:41Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.