Model-based micro-data reinforcement learning: what are the crucial
model properties and which model to choose?
- URL: http://arxiv.org/abs/2107.11587v1
- Date: Sat, 24 Jul 2021 11:38:25 GMT
- Title: Model-based micro-data reinforcement learning: what are the crucial
model properties and which model to choose?
- Authors: Bal\'azs K\'egl, Gabriel Hurtado, Albert Thomas
- Abstract summary: We contribute to micro-data model-based reinforcement learning (MBRL) by rigorously comparing popular generative models.
We find that on an environment that requires multimodal posterior predictives, mixture density nets outperform all other models by a large margin.
We also found that deterministic models are on par, in fact they consistently (although non-significantly) outperform their probabilistic counterparts.
- Score: 0.2836066255205732
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We contribute to micro-data model-based reinforcement learning (MBRL) by
rigorously comparing popular generative models using a fixed (random shooting)
control agent. We find that on an environment that requires multimodal
posterior predictives, mixture density nets outperform all other models by a
large margin. When multimodality is not required, our surprising finding is
that we do not need probabilistic posterior predictives: deterministic models
are on par, in fact they consistently (although non-significantly) outperform
their probabilistic counterparts. We also found that heteroscedasticity at
training time, perhaps acting as a regularizer, improves predictions at longer
horizons. At the methodological side, we design metrics and an experimental
protocol which can be used to evaluate the various models, predicting their
asymptotic performance when using them on the control problem. Using this
framework, we improve the state-of-the-art sample complexity of MBRL on Acrobot
by two to four folds, using an aggressive training schedule which is outside of
the hyperparameter interval usually considered
Related papers
- Predictive Churn with the Set of Good Models [64.05949860750235]
We study the effect of conflicting predictions over the set of near-optimal machine learning models.
We present theoretical results on the expected churn between models within the Rashomon set.
We show how our approach can be used to better anticipate, reduce, and avoid churn in consumer-facing applications.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Multi-timestep models for Model-based Reinforcement Learning [10.940666275830052]
In model-based reinforcement learning (MBRL), most algorithms rely on simulating trajectories from one-step dynamics models learned on data.
We tackle this issue by using a multi-timestep objective to train one-step models.
We find that exponentially decaying weights lead to models that significantly improve the long-horizon R2 score.
arXiv Detail & Related papers (2023-10-09T12:42:39Z) - Stable Training of Probabilistic Models Using the Leave-One-Out Maximum Log-Likelihood Objective [0.7373617024876725]
Kernel density estimation (KDE) based models are popular choices for this task, but they fail to adapt to data regions with varying densities.
An adaptive KDE model is employed to circumvent this, where each kernel in the model has an individual bandwidth.
A modified expectation-maximization algorithm is employed to accelerate the optimization speed reliably.
arXiv Detail & Related papers (2023-10-05T14:08:42Z) - Predictable MDP Abstraction for Unsupervised Model-Based RL [93.91375268580806]
We propose predictable MDP abstraction (PMA)
Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space.
We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
arXiv Detail & Related papers (2023-02-08T07:37:51Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Stochastic Parameterizations: Better Modelling of Temporal Correlations
using Probabilistic Machine Learning [1.5293427903448025]
We show that by using a physically-informed recurrent neural network within a probabilistic framework, our model for the 96 atmospheric simulation is competitive.
This is due to a superior ability to model temporal correlations compared to standard first-order autoregressive schemes.
We evaluate across a number of metrics from the literature, but also discuss how the probabilistic metric of likelihood may be a unifying choice for future climate models.
arXiv Detail & Related papers (2022-03-28T14:51:42Z) - Anomaly Detection of Time Series with Smoothness-Inducing Sequential
Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series.
Our model parameterizes mean and variance for each time-stamp with flexible neural networks.
We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z) - Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual
Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms.
We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance.
We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z) - Nonparametric Estimation in the Dynamic Bradley-Terry Model [69.70604365861121]
We develop a novel estimator that relies on kernel smoothing to pre-process the pairwise comparisons over time.
We derive time-varying oracle bounds for both the estimation error and the excess risk in the model-agnostic setting.
arXiv Detail & Related papers (2020-02-28T21:52:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.