Planning with Exploration: Addressing Dynamics Bottleneck in Model-based
Reinforcement Learning
- URL: http://arxiv.org/abs/2010.12914v3
- Date: Thu, 24 Jun 2021 16:06:52 GMT
- Title: Planning with Exploration: Addressing Dynamics Bottleneck in Model-based
Reinforcement Learning
- Authors: Xiyao Wang, Junge Zhang, Wenzhen Huang, Qiyue Yin
- Abstract summary: We find that the trajectory reward estimation error is the main reason that causes dynamics bottleneck dilemma through theoretical analysis.
Motivated by this, a model-based control method combined with exploration named MOdel-based Progressive Entropy-based Exploration (MOPE2) is proposed.
- Score: 25.077671501605746
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based reinforcement learning (MBRL) is believed to have higher sample
efficiency compared with model-free reinforcement learning (MFRL). However,
MBRL is plagued by dynamics bottleneck dilemma. Dynamics bottleneck dilemma is
the phenomenon that the performance of the algorithm falls into the local
optimum instead of increasing when the interaction step with the environment
increases, which means more data can not bring better performance. In this
paper, we find that the trajectory reward estimation error is the main reason
that causes dynamics bottleneck dilemma through theoretical analysis. We give
an upper bound of the trajectory reward estimation error and point out that
increasing the agent's exploration ability is the key to reduce trajectory
reward estimation error, thereby alleviating dynamics bottleneck dilemma.
Motivated by this, a model-based control method combined with exploration named
MOdel-based Progressive Entropy-based Exploration (MOPE2) is proposed. We
conduct experiments on several complex continuous control benchmark tasks. The
results verify that MOPE2 can effectively alleviate dynamics bottleneck dilemma
and have higher sample efficiency than previous MBRL and MFRL algorithms.
Related papers
- Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation [36.308936312224404]
This paper introduces Adaptive Horizon Actor-Critic (AHAC), an FO-MBRL algorithm that reduces gradient error by adapting the model-based horizon to avoid stiff dynamics.
Empirical findings reveal that AHAC outperforms MFRL baselines, attaining 40% more reward across a set of locomotion tasks and efficiently scaling to high-dimensional control environments with improved wall-clock-time efficiency.
arXiv Detail & Related papers (2024-05-28T03:28:00Z) - Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration [15.463313629574111]
This paper investigates how to achieve sample-efficient exploration in continuous control tasks.
We introduce an RL algorithm that incorporates a predictive model and off-policy learning elements.
We derive an intrinsic reward without incurring parameters overhead.
arXiv Detail & Related papers (2024-03-31T11:39:11Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Causal Dynamics Learning for Task-Independent State Abstraction [61.707048209272884]
We introduce Causal Dynamics Learning for Task-Independent State Abstraction (CDL)
CDL learns a theoretically proved causal dynamics model that removes unnecessary dependencies between state variables and the action.
A state abstraction can then be derived from the learned dynamics.
arXiv Detail & Related papers (2022-06-27T17:02:53Z) - Value Gradient weighted Model-Based Reinforcement Learning [28.366157882991565]
Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies.
VaGraM is a novel method for value-aware model learning.
arXiv Detail & Related papers (2022-04-04T13:28:31Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Learning to Reweight Imaginary Transitions for Model-Based Reinforcement
Learning [58.66067369294337]
When the model is inaccurate or biased, imaginary trajectories may be deleterious for training the action-value and policy functions.
We adaptively reweight the imaginary transitions, so as to reduce the negative effects of poorly generated trajectories.
Our method outperforms state-of-the-art model-based and model-free RL algorithms on multiple tasks.
arXiv Detail & Related papers (2021-04-09T03:13:35Z) - Discriminator Augmented Model-Based Reinforcement Learning [47.094522301093775]
It is common in practice for the learned model to be inaccurate, impairing planning and leading to poor performance.
This paper aims to improve planning with an importance sampling framework that accounts for discrepancy between the true and learned dynamics.
arXiv Detail & Related papers (2021-03-24T06:01:55Z) - Imitation with Neural Density Models [98.34503611309256]
We propose a new framework for Imitation Learning (IL) via density estimation of the expert's occupancy measure followed by Imitation Occupancy Entropy Reinforcement Learning (RL) using the density as a reward.
Our approach maximizes a non-adversarial model-free RL objective that provably lower bounds reverse Kullback-Leibler divergence between occupancy measures of the expert and imitator.
arXiv Detail & Related papers (2020-10-19T19:38:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.