Autoregressive Dynamics Models for Offline Policy Evaluation and
Optimization
- URL: http://arxiv.org/abs/2104.13877v1
- Date: Wed, 28 Apr 2021 16:48:44 GMT
- Title: Autoregressive Dynamics Models for Offline Policy Evaluation and
Optimization
- Authors: Michael R. Zhang, Tom Le Paine, Ofir Nachum, Cosmin Paduraru, George
Tucker, Ziyu Wang, Mohammad Norouzi
- Abstract summary: We show that expressive autoregressive dynamics models generate different dimensions of the next state and reward sequentially conditioned on previous dimensions.
We also show that autoregressive dynamics models are useful for offline policy optimization by serving as a way to enrich the replay buffer.
- Score: 60.73540999409032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Standard dynamics models for continuous control make use of feedforward
computation to predict the conditional distribution of next state and reward
given current state and action using a multivariate Gaussian with a diagonal
covariance structure. This modeling choice assumes that different dimensions of
the next state and reward are conditionally independent given the current state
and action and may be driven by the fact that fully observable physics-based
simulation environments entail deterministic transition dynamics. In this
paper, we challenge this conditional independence assumption and propose a
family of expressive autoregressive dynamics models that generate different
dimensions of the next state and reward sequentially conditioned on previous
dimensions. We demonstrate that autoregressive dynamics models indeed
outperform standard feedforward models in log-likelihood on heldout
transitions. Furthermore, we compare different model-based and model-free
off-policy evaluation (OPE) methods on RL Unplugged, a suite of offline MuJoCo
datasets, and find that autoregressive dynamics models consistently outperform
all baselines, achieving a new state-of-the-art. Finally, we show that
autoregressive dynamics models are useful for offline policy optimization by
serving as a way to enrich the replay buffer through data augmentation and
improving performance using model-based planning.
Related papers
- Amortized Control of Continuous State Space Feynman-Kac Model for Irregular Time Series [14.400596021890863]
Many real-world datasets, such as healthcare, climate, and economics, are often collected as irregular time series.
We propose the Amortized Control of continuous State Space Model (ACSSM) for continuous dynamical modeling of time series.
arXiv Detail & Related papers (2024-10-08T01:27:46Z) - Latent Space Energy-based Neural ODEs [73.01344439786524]
This paper introduces a novel family of deep dynamical models designed to represent continuous-time sequence data.
We train the model using maximum likelihood estimation with Markov chain Monte Carlo.
Experiments on oscillating systems, videos and real-world state sequences (MuJoCo) illustrate that ODEs with the learnable energy-based prior outperform existing counterparts.
arXiv Detail & Related papers (2024-09-05T18:14:22Z) - Towards Efficient Modelling of String Dynamics: A Comparison of State Space and Koopman based Deep Learning Methods [8.654571696634825]
State Space Models (SSM) and Koopman-based deep learning methods for modelling the dynamics of both linear and non-linear stiff strings.
Our findings indicate that our proposed Koopman-based model performs as well as or better than other existing approaches in non-linear cases for long-sequence modelling.
This research contributes insights into the physical modelling of dynamical systems by offering a comparative overview of these and previous methods and introducing innovative strategies for model improvement.
arXiv Detail & Related papers (2024-08-29T15:55:27Z) - Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models [54.132297393662654]
We introduce a hybrid method that fine-tunes cutting-edge diffusion models by optimizing reward models through RL.
We demonstrate the capability of our approach to outperform the best designs in offline data, leveraging the extrapolation capabilities of reward models.
arXiv Detail & Related papers (2024-05-30T03:57:29Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Model-Based Offline Reinforcement Learning with Pessimism-Modulated
Dynamics Belief [3.0036519884678894]
Model-based offline reinforcement learning (RL) aims to find highly rewarding policy, by leveraging a previously collected static dataset and a dynamics model.
In this work, we maintain a belief distribution over dynamics, and evaluate/optimize policy through biased sampling from the belief.
We show that the biased sampling naturally induces an updated dynamics belief with policy-dependent reweighting factor, termed Pessimism-Modulated Dynamics Belief.
arXiv Detail & Related papers (2022-10-13T03:14:36Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Improving Sequential Latent Variable Models with Autoregressive Flows [30.053464816814348]
We propose an approach for improving sequence modeling based on autoregressive normalizing flows.
Results are presented on three benchmark video datasets, where autoregressive flow-based dynamics improve log-likelihood performance.
arXiv Detail & Related papers (2020-10-07T05:14:37Z) - Reinforcement Learning based dynamic weighing of Ensemble Models for
Time Series Forecasting [0.8399688944263843]
It is known that if models selected for data modelling are distinct (linear/non-linear, static/dynamic) and independent (minimally correlated) models, the accuracy of the predictions is improved.
Various approaches suggested in the literature to weigh the ensemble models use a static set of weights.
To address this issue, a Reinforcement Learning (RL) approach to dynamically assign and update weights of each of the models at different time instants.
arXiv Detail & Related papers (2020-08-20T10:40:42Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.