On the model-based stochastic value gradient for continuous
reinforcement learning
- URL: http://arxiv.org/abs/2008.12775v3
- Date: Thu, 27 May 2021 17:59:15 GMT
- Title: On the model-based stochastic value gradient for continuous
reinforcement learning
- Authors: Brandon Amos, Samuel Stanton, Denis Yarats, Andrew Gordon Wilson
- Abstract summary: We show that simple model-based agents can outperform state-of-the-art model-free agents in terms of both sample-efficiency and final reward.
Our findings suggest that model-based policy evaluation deserves closer attention.
- Score: 50.085645237597056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For over a decade, model-based reinforcement learning has been seen as a way
to leverage control-based domain knowledge to improve the sample-efficiency of
reinforcement learning agents. While model-based agents are conceptually
appealing, their policies tend to lag behind those of model-free agents in
terms of final reward, especially in non-trivial environments. In response,
researchers have proposed model-based agents with increasingly complex
components, from ensembles of probabilistic dynamics models, to heuristics for
mitigating model error. In a reversal of this trend, we show that simple
model-based agents can be derived from existing ideas that not only match, but
outperform state-of-the-art model-free agents in terms of both
sample-efficiency and final reward. We find that a model-free soft value
estimate for policy evaluation and a model-based stochastic value gradient for
policy improvement is an effective combination, achieving state-of-the-art
results on a high-dimensional humanoid control task, which most model-based
agents are unable to solve. Our findings suggest that model-based policy
evaluation deserves closer attention.
Related papers
- Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort [31.992947353231564]
Concept Bottleneck Models (CBMs) can provide a principled way of disclosing and guiding model behaviors through human-understandable concepts.
We propose a novel framework designed to exploit pre-trained models while being immune to these biases, thereby reducing vulnerability to spurious correlations.
We evaluate the proposed method on multiple datasets, and the results demonstrate its effectiveness in reducing model reliance on spurious correlations while preserving its interpretability.
arXiv Detail & Related papers (2024-07-12T03:07:28Z) - Deep autoregressive density nets vs neural ensembles for model-based
offline reinforcement learning [2.9158689853305693]
We consider a model-based reinforcement learning algorithm that infers the system dynamics from the available data and performs policy optimization on imaginary model rollouts.
This approach is vulnerable to exploiting model errors which can lead to catastrophic failures on the real system.
We show that better performance can be obtained with a single well-calibrated autoregressive model on the D4RL benchmark.
arXiv Detail & Related papers (2024-02-05T10:18:15Z) - RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment [32.752633250862694]
Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data.
We introduce a new framework, Reward rAnked FineTuning, designed to align generative models effectively.
arXiv Detail & Related papers (2023-04-13T18:22:40Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Deciding What to Model: Value-Equivalent Sampling for Reinforcement
Learning [21.931580762349096]
We introduce an algorithm that computes an approximately-value-equivalent, lossy compression of the environment which an agent may feasibly target in lieu of the true model.
We prove an information-theoretic, Bayesian regret bound for our algorithm that holds for any finite-horizon, episodic sequential decision-making problem.
arXiv Detail & Related papers (2022-06-04T23:36:38Z) - Sample-Efficient Reinforcement Learning via Conservative Model-Based
Actor-Critic [67.00475077281212]
Model-based reinforcement learning algorithms are more sample efficient than their model-free counterparts.
We propose a novel approach that achieves high sample efficiency without the strong reliance on accurate learned models.
We show that CMBAC significantly outperforms state-of-the-art approaches in terms of sample efficiency on several challenging tasks.
arXiv Detail & Related papers (2021-12-16T15:33:11Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - Sample Efficient Reinforcement Learning via Model-Ensemble Exploration
and Exploitation [3.728946517493471]
MEEE is a model-ensemble method that consists of optimistic exploration and weighted exploitation.
Our approach outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.
arXiv Detail & Related papers (2021-07-05T07:18:20Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.