Double Check Your State Before Trusting It: Confidence-Aware
Bidirectional Offline Model-Based Imagination
- URL: http://arxiv.org/abs/2206.07989v1
- Date: Thu, 16 Jun 2022 08:00:44 GMT
- Title: Double Check Your State Before Trusting It: Confidence-Aware
Bidirectional Offline Model-Based Imagination
- Authors: Jiafei Lyu, Xiu Li, Zongqing Lu
- Abstract summary: We propose to augment the offline dataset by using trained bidirectional dynamics models and rollout policies with double check.
Our method, confidence-aware bidirectional offline model-based imagination, generates reliable samples and can be combined with any model-free offline RL method.
- Score: 31.805991958408438
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The learned policy of model-free offline reinforcement learning (RL) methods
is often constrained to stay within the support of datasets to avoid possible
dangerous out-of-distribution actions or states, making it challenging to
handle out-of-support region. Model-based RL methods offer a richer dataset and
benefit generalization by generating imaginary trajectories with either trained
forward or reverse dynamics model. However, the imagined transitions may be
inaccurate, thus downgrading the performance of the underlying offline RL
method. In this paper, we propose to augment the offline dataset by using
trained bidirectional dynamics models and rollout policies with double check.
We introduce conservatism by trusting samples that the forward model and
backward model agree on. Our method, confidence-aware bidirectional offline
model-based imagination, generates reliable samples and can be combined with
any model-free offline RL method. Experimental results on the D4RL benchmarks
demonstrate that our method significantly boosts the performance of existing
model-free offline RL algorithms and achieves competitive or better scores
against baseline methods.
Related papers
- ROLeR: Effective Reward Shaping in Offline Reinforcement Learning for Recommender Systems [14.74207332728742]
offline reinforcement learning (RL) is an effective tool for real-world recommender systems.
This paper proposes a novel model-based Reward Shaping in Offline Reinforcement Learning for Recommender Systems, ROLeR, for reward and uncertainty estimation.
arXiv Detail & Related papers (2024-07-18T05:07:11Z) - SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets [32.496818080222646]
We propose a new approach to model-based offline reinforcement learning.
We provide a theoretical guarantee of model uncertainty and performance bound of SeMOPO.
Experimental results show that our method substantially outperforms all baseline methods.
arXiv Detail & Related papers (2024-06-13T15:16:38Z) - Dual Generator Offline Reinforcement Learning [90.05278061564198]
In offline RL, constraining the learned policy to remain close to the data is essential.
In practice, GAN-based offline RL methods have not performed as well as alternative approaches.
We show that not only does having two generators enable an effective GAN-based offline RL method, but also approximates a support constraint.
arXiv Detail & Related papers (2022-11-02T20:25:18Z) - Backward Imitation and Forward Reinforcement Learning via Bi-directional
Model Rollouts [11.4219428942199]
Traditional model-based reinforcement learning (RL) methods generate forward rollout traces using the learnt dynamics model.
In this paper, we propose the backward imitation and forward reinforcement learning (BIFRL) framework.
BIFRL empowers the agent to both reach to and explore from high-value states in a more efficient manner.
arXiv Detail & Related papers (2022-08-04T04:04:05Z) - Pessimistic Model Selection for Offline Deep Reinforcement Learning [56.282483586473816]
Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications.
One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL.
We propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee.
arXiv Detail & Related papers (2021-11-29T06:29:49Z) - Offline Reinforcement Learning with Reverse Model-based Imagination [25.376888160137973]
In offline reinforcement learning (offline RL), one of the main challenges is to deal with the distributional shift between the learning policy and the given dataset.
Recent offline RL methods attempt to introduce conservatism bias to encourage learning on high-confidence areas.
We propose a novel model-based offline RL framework, called Reverse Offline Model-based Imagination (ROMI)
arXiv Detail & Related papers (2021-10-01T03:13:22Z) - Behavioral Priors and Dynamics Models: Improving Performance and Domain
Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE)
MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary.
In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Offline Reinforcement Learning from Images with Latent Space Models [60.69745540036375]
offline reinforcement learning (RL) refers to the problem of learning policies from a static dataset of environment interactions.
We build on recent advances in model-based algorithms for offline RL, and extend them to high-dimensional visual observation spaces.
Our approach is both tractable in practice and corresponds to maximizing a lower bound of the ELBO in the unknown POMDP.
arXiv Detail & Related papers (2020-12-21T18:28:17Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.