Self-Consistent Models and Values
- URL: http://arxiv.org/abs/2110.12840v1
- Date: Mon, 25 Oct 2021 12:09:42 GMT
- Title: Self-Consistent Models and Values
- Authors: Gregory Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo
Hessel, Hado van Hasselt, David Silver
- Abstract summary: Learned models of the environment provide reinforcement learning (RL) agents with flexible ways of making predictions about the environment.
In this work, we investigate a way of augmenting model-based RL, by additionally encouraging a learned model and value function to be jointly emphself-consistent
Our approach differs from classic planning methods such as Dyna, which only update values to be consistent with the model.
- Score: 42.53364554418915
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learned models of the environment provide reinforcement learning (RL) agents
with flexible ways of making predictions about the environment. In particular,
models enable planning, i.e. using more computation to improve value functions
or policies, without requiring additional environment interactions. In this
work, we investigate a way of augmenting model-based RL, by additionally
encouraging a learned model and value function to be jointly
\emph{self-consistent}. Our approach differs from classic planning methods such
as Dyna, which only update values to be consistent with the model. We propose
multiple self-consistency updates, evaluate these in both tabular and function
approximation settings, and find that, with appropriate choices,
self-consistency helps both policy evaluation and control.
Related papers
- Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning [39.53836535326121]
We propose Distillation for In-Context Planning (DICP), an in-context model-based RL framework where Transformers simultaneously learn environment dynamics and improve policy in-context.
Our results show that DICP achieves state-of-the-art performance while requiring significantly fewer environment interactions than baselines.
arXiv Detail & Related papers (2025-02-26T10:16:57Z) - Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [31.509112804985133]
Reinforcement learning (RL) learns policies through trial and error, and optimal control, which plans actions using a learned or known dynamics model.
We systematically analyze the performance of different RL and control-based methods under datasets of varying quality.
Our results show that model-free RL excels when abundant, high-quality data is available, while model-based planning excels in generalization to novel environment layouts, trajectory stitching, and data-efficiency.
arXiv Detail & Related papers (2025-02-20T18:39:41Z) - AMUSE: Adaptive Model Updating using a Simulated Environment [1.6124402884077915]
Prediction models frequently face the challenge of concept drift, in which the underlying data distribution changes over time, weakening performance.
We present AMUSE, a novel method leveraging reinforcement learning trained within a simulated data generating environment.
As a result, AMUSE proactively recommends updates based on estimated performance improvements.
arXiv Detail & Related papers (2024-12-13T13:04:46Z) - COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically
for Model-Based RL [50.385005413810084]
Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration.
$textttCOPlanner$ is a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem.
arXiv Detail & Related papers (2023-10-11T06:10:07Z) - Adaptive Rollout Length for Model-Based RL Using Model-Free Deep RL [39.58890668062184]
We frame the problem of tuning the rollout length as a meta-level sequential decision-making problem.
We use model-free deep reinforcement learning to solve the meta-level decision problem.
arXiv Detail & Related papers (2022-06-06T06:25:11Z) - Model-Value Inconsistency as a Signal for Epistemic Uncertainty [22.492926703232015]
Self-inconsistency is a signal for exploration, (ii) for acting safely under distribution shifts, and (iii) for robustifying value-based planning with a model.
We show that, unlike prior work, this approach requires only the single model and value function which are already being learned in most model-based reinforcement learning algorithms.
arXiv Detail & Related papers (2021-12-08T07:53:41Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Deep Interactive Bayesian Reinforcement Learning via Meta-Learning [63.96201773395921]
The optimal adaptive behaviour under uncertainty over the other agents' strategies can be computed using the Interactive Bayesian Reinforcement Learning framework.
We propose to meta-learn approximate belief inference and Bayes-optimal behaviour for a given prior.
We show empirically that our approach outperforms existing methods that use a model-free approach, sample from the approximate posterior, maintain memory-free models of others, or do not fully utilise the known structure of the environment.
arXiv Detail & Related papers (2021-01-11T13:25:13Z) - The Value Equivalence Principle for Model-Based Reinforcement Learning [29.368870568214007]
We argue that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based planning.
We show that, as we augment the set of policies and functions considered, the class of value equivalent models shrinks.
We argue that the principle of value equivalence underlies a number of recent empirical successes in RL.
arXiv Detail & Related papers (2020-11-06T18:25:54Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Variational Model-based Policy Optimization [34.80171122943031]
Model-based reinforcement learning (RL) algorithms allow us to combine model-generated data with those collected from interaction with the real system in order to alleviate the data efficiency problem in RL.
We propose an objective function as a variational lower-bound of a log-likelihood of a log-likelihood to jointly learn and improve model and policy.
Our experiments on a number of continuous control tasks show that despite being more complex, our model-based (E-step) algorithm, called emactoral model-based policy optimization (VMBPO), is more sample-efficient and
arXiv Detail & Related papers (2020-06-09T18:30:15Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z) - Policy-Aware Model Learning for Policy Gradient Methods [29.129883702165774]
This paper considers the problem of learning a model in model-based reinforcement learning (MBRL)
We propose that the model learning module should incorporate the way the planner is going to use the model.
We call this approach Policy-Aware Model Learning (PAML)
arXiv Detail & Related papers (2020-02-28T19:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.