Revisiting Model-based Value Expansion
- URL: http://arxiv.org/abs/2203.14660v1
- Date: Mon, 28 Mar 2022 11:21:49 GMT
- Title: Revisiting Model-based Value Expansion
- Authors: Daniel Palenicek, Michael Lutter, Jan Peters
- Abstract summary: Model-based value expansion methods promise to improve the quality of value function targets and the effectiveness of value function learning.
However, to date, these methods are being outperformed by Dyna-style algorithms with conceptually simpler 1-step value function targets.
We provide a thorough empirical study to shed light on the causes of failure of value expansion methods in practice.
- Score: 35.55280687116388
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model-based value expansion methods promise to improve the quality of value
function targets and, thereby, the effectiveness of value function learning.
However, to date, these methods are being outperformed by Dyna-style algorithms
with conceptually simpler 1-step value function targets. This shows that in
practice, the theoretical justification of value expansion does not seem to
hold. We provide a thorough empirical study to shed light on the causes of
failure of value expansion methods in practice which is believed to be the
compounding model error. By leveraging GPU based physics simulators, we are
able to efficiently use the true dynamics for analysis inside the model-based
reinforcement learning loop. Performing extensive comparisons between true and
learned dynamics sheds light into this black box. This paper provides a better
understanding of the actual problems in value expansion. We provide future
directions of research by empirically testing the maximum theoretical
performance of current approaches.
Related papers
- On Stateful Value Factorization in Multi-Agent Reinforcement Learning [19.342676562701794]
We introduce Duelmix, a factorization algorithm that learns distinct per-agent utility estimators to improve performance.
Experiments on StarCraft II micromanagement and Box Pushing tasks demonstrate the benefits of our intuitions.
arXiv Detail & Related papers (2024-08-27T19:45:26Z) - The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning [37.387280102209274]
offline reinforcement learning aims to enable agents to be trained from pre-collected datasets, however, this comes with the added challenge of estimating the value of behavior not covered in the dataset.
Model-based methods offer a solution by allowing agents to collect additional synthetic data via rollouts in a learned dynamics model.
However, if the learned dynamics model is replaced by the true error-free dynamics, existing model-based methods completely fail.
We propose Reach-Aware Value Learning (RAVL), a simple and robust method that directly addresses the edge-of-reach problem.
arXiv Detail & Related papers (2024-02-19T20:38:00Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - Value Gradient weighted Model-Based Reinforcement Learning [28.366157882991565]
Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies.
VaGraM is a novel method for value-aware model learning.
arXiv Detail & Related papers (2022-04-04T13:28:31Z) - Model-free and Bayesian Ensembling Model-based Deep Reinforcement
Learning for Particle Accelerator Control Demonstrated on the FERMI FEL [0.0]
This paper shows how reinforcement learning can be used on an operational level on accelerator physics problems.
We compare purely model-based to model-free reinforcement learning applied to the intensity optimisation on the FERMI FEL system.
We find that the model-based approach demonstrates higher representational power and sample-efficiency, while the performance of the model-free method is slightly superior.
arXiv Detail & Related papers (2020-12-17T16:57:27Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z) - Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function.
We develop an approach for representation learning in RL that sits in between these two extremes.
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.