Model-Augmented Actor-Critic: Backpropagating through Paths
- URL: http://arxiv.org/abs/2005.08068v1
- Date: Sat, 16 May 2020 19:18:10 GMT
- Title: Model-Augmented Actor-Critic: Backpropagating through Paths
- Authors: Ignasi Clavera, Violet Fu, Pieter Abbeel
- Abstract summary: Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
- Score: 81.86992776864729
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current model-based reinforcement learning approaches use the model simply as
a learned black-box simulator to augment the data for policy optimization or
value function learning. In this paper, we show how to make more effective use
of the model by exploiting its differentiability. We construct a policy
optimization algorithm that uses the pathwise derivative of the learned model
and policy across future timesteps. Instabilities of learning across many
timesteps are prevented by using a terminal value function, learning the policy
in an actor-critic fashion. Furthermore, we present a derivation on the
monotonic improvement of our objective in terms of the gradient error in the
model and value function. We show that our approach (i) is consistently more
sample efficient than existing state-of-the-art model-based algorithms, (ii)
matches the asymptotic performance of model-free algorithms, and (iii) scales
to long horizons, a regime where typically past model-based approaches have
struggled.
Related papers
- Model-based Policy Optimization using Symbolic World Model [46.42871544295734]
The application of learning-based control methods in robotics presents significant challenges.
One is that model-free reinforcement learning algorithms use observation data with low sample efficiency.
We suggest approximating transition dynamics with symbolic expressions, which are generated via symbolic regression.
arXiv Detail & Related papers (2024-07-18T13:49:21Z) - The Virtues of Laziness in Model-based RL: A Unified Objective and
Algorithms [37.025378882978714]
We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL)
Our "lazy" method leverages a novel unified objective, Performance Difference via Advantage in Model, to capture the performance difference between the learned policy and expert policy.
We present two no-regret algorithms to optimize the proposed objective, and demonstrate their statistical and computational gains.
arXiv Detail & Related papers (2023-03-01T17:42:26Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Model-based Reinforcement Learning with Multi-step Plan Value Estimation [4.158979444110977]
We introduce multi-step plans to replace multi-step actions for model-based RL.
The new model-based reinforcement learning algorithm MPPVE shows a better utilization of the learned model and achieves a better sample efficiency than state-of-the-art model-based RL approaches.
arXiv Detail & Related papers (2022-09-12T18:22:11Z) - Control-Oriented Model-Based Reinforcement Learning with Implicit
Differentiation [11.219641045667055]
We propose an end-to-end approach for model learning which directly optimize the expected returns using implicit differentiation.
We provide theoretical and empirical evidence highlighting the benefits of our approach in the model misspecification regime compared to likelihood-based methods.
arXiv Detail & Related papers (2021-06-06T23:15:49Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Model-based Meta Reinforcement Learning using Graph Structured Surrogate
Models [40.08137765886609]
We show that our model, called a graph structured surrogate model (GSSM), outperforms state-of-the-art methods in predicting environment dynamics.
Our approach is able to obtain high returns, while allowing fast execution during deployment by avoiding test time policy gradient optimization.
arXiv Detail & Related papers (2021-02-16T17:21:55Z) - Distilling Interpretable Models into Human-Readable Code [71.11328360614479]
Human-readability is an important and desirable standard for machine-learned model interpretability.
We propose to train interpretable models using conventional methods, and then distill them into concise, human-readable code.
We describe a piecewise-linear curve-fitting algorithm that produces high-quality results efficiently and reliably across a broad range of use cases.
arXiv Detail & Related papers (2021-01-21T01:46:36Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.