Bootstrapped model learning and error correction for planning with
uncertainty in model-based RL
- URL: http://arxiv.org/abs/2004.07155v1
- Date: Wed, 15 Apr 2020 15:41:21 GMT
- Title: Bootstrapped model learning and error correction for planning with
uncertainty in model-based RL
- Authors: Alvaro Ovalle, Simon M. Lucas
- Abstract summary: A natural aim is to learn a model that reflects accurately the dynamics of the environment.
This paper explores the problem of model misspecification through uncertainty-aware reinforcement learning agents.
We propose a bootstrapped multi-headed neural network that learns the distribution of future states and rewards.
- Score: 1.370633147306388
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Having access to a forward model enables the use of planning algorithms such
as Monte Carlo Tree Search and Rolling Horizon Evolution. Where a model is
unavailable, a natural aim is to learn a model that reflects accurately the
dynamics of the environment. In many situations it might not be possible and
minimal glitches in the model may lead to poor performance and failure. This
paper explores the problem of model misspecification through uncertainty-aware
reinforcement learning agents. We propose a bootstrapped multi-headed neural
network that learns the distribution of future states and rewards. We
experiment with a number of schemes to extract the most likely predictions.
Moreover, we also introduce a global error correction filter that applies
high-level constraints guided by the context provided through the predictive
distribution. We illustrate our approach on Minipacman. The evaluation
demonstrates that when dealing with imperfect models, our methods exhibit
increased performance and stability, both in terms of model accuracy and in its
use within a planning algorithm.
Related papers
- Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.
We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.
Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Deep autoregressive density nets vs neural ensembles for model-based
offline reinforcement learning [2.9158689853305693]
We consider a model-based reinforcement learning algorithm that infers the system dynamics from the available data and performs policy optimization on imaginary model rollouts.
This approach is vulnerable to exploiting model errors which can lead to catastrophic failures on the real system.
We show that better performance can be obtained with a single well-calibrated autoregressive model on the D4RL benchmark.
arXiv Detail & Related papers (2024-02-05T10:18:15Z) - COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically
for Model-Based RL [50.385005413810084]
Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration.
$textttCOPlanner$ is a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem.
arXiv Detail & Related papers (2023-10-11T06:10:07Z) - Predictable MDP Abstraction for Unsupervised Model-Based RL [93.91375268580806]
We propose predictable MDP abstraction (PMA)
Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space.
We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
arXiv Detail & Related papers (2023-02-08T07:37:51Z) - Plan To Predict: Learning an Uncertainty-Foreseeing Model for
Model-Based Reinforcement Learning [32.24146877835396]
We propose emphPlan To Predict (P2P), a framework that treats the model rollout process as a sequential decision making problem.
We show that P2P achieves state-of-the-art performance on several challenging benchmark tasks.
arXiv Detail & Related papers (2023-01-20T10:17:22Z) - Monitoring Model Deterioration with Explainable Uncertainty Estimation
via Non-parametric Bootstrap [0.0]
Monitoring machine learning models once they are deployed is challenging.
It is even more challenging to decide when to retrain models in real-case scenarios when labeled data is beyond reach.
In this work, we use non-parametric bootstrapped uncertainty estimates and SHAP values to provide explainable uncertainty estimation.
arXiv Detail & Related papers (2022-01-27T17:23:04Z) - Sufficiently Accurate Model Learning for Planning [119.80502738709937]
This paper introduces the constrained Sufficiently Accurate model learning approach.
It provides examples of such problems, and presents a theorem on how close some approximate solutions can be.
The approximate solution quality will depend on the function parameterization, loss and constraint function smoothness, and the number of samples in model learning.
arXiv Detail & Related papers (2021-02-11T16:27:31Z) - Generative Temporal Difference Learning for Infinite-Horizon Prediction [101.59882753763888]
We introduce the $gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon.
We discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors.
arXiv Detail & Related papers (2020-10-27T17:54:12Z) - Prediction-Centric Learning of Independent Cascade Dynamics from Partial
Observations [13.680949377743392]
We address the problem of learning of a spreading model such that the predictions generated from this model are accurate.
We introduce a computationally efficient algorithm, based on a scalable dynamic message-passing approach.
We show that tractable inference from the learned model generates a better prediction of marginal probabilities compared to the original model.
arXiv Detail & Related papers (2020-07-13T17:58:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.