The Value Equivalence Principle for Model-Based Reinforcement Learning
- URL: http://arxiv.org/abs/2011.03506v1
- Date: Fri, 6 Nov 2020 18:25:54 GMT
- Title: The Value Equivalence Principle for Model-Based Reinforcement Learning
- Authors: Christopher Grimm, Andr\'e Barreto, Satinder Singh, David Silver
- Abstract summary: We argue that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based planning.
We show that, as we augment the set of policies and functions considered, the class of value equivalent models shrinks.
We argue that the principle of value equivalence underlies a number of recent empirical successes in RL.
- Score: 29.368870568214007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning models of the environment from data is often viewed as an essential
component to building intelligent reinforcement learning (RL) agents. The
common practice is to separate the learning of the model from its use, by
constructing a model of the environment's dynamics that correctly predicts the
observed state transitions. In this paper we argue that the limited
representational resources of model-based RL agents are better used to build
models that are directly useful for value-based planning. As our main
contribution, we introduce the principle of value equivalence: two models are
value equivalent with respect to a set of functions and policies if they yield
the same Bellman updates. We propose a formulation of the model learning
problem based on the value equivalence principle and analyze how the set of
feasible solutions is impacted by the choice of policies and functions.
Specifically, we show that, as we augment the set of policies and functions
considered, the class of value equivalent models shrinks, until eventually
collapsing to a single point corresponding to a model that perfectly describes
the environment. In many problems, directly modelling state-to-state
transitions may be both difficult and unnecessary. By leveraging the
value-equivalence principle one may find simpler models without compromising
performance, saving computation and memory. We illustrate the benefits of
value-equivalent model learning with experiments comparing it against more
traditional counterparts like maximum likelihood estimation. More generally, we
argue that the principle of value equivalence underlies a number of recent
empirical successes in RL, such as Value Iteration Networks, the Predictron,
Value Prediction Networks, TreeQN, and MuZero, and provides a first theoretical
underpinning of those results.
Related papers
- Minimal Value-Equivalent Partial Models for Scalable and Robust Planning
in Lifelong Reinforcement Learning [56.50123642237106]
Common practice in model-based reinforcement learning is to learn models that model every aspect of the agent's environment.
We argue that such models are not particularly well-suited for performing scalable and robust planning in lifelong reinforcement learning scenarios.
We propose new kinds of models that only model the relevant aspects of the environment, which we call "minimal value-minimal partial models"
arXiv Detail & Related papers (2023-01-24T16:40:01Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - Causal Dynamics Learning for Task-Independent State Abstraction [61.707048209272884]
We introduce Causal Dynamics Learning for Task-Independent State Abstraction (CDL)
CDL learns a theoretically proved causal dynamics model that removes unnecessary dependencies between state variables and the action.
A state abstraction can then be derived from the learned dynamics.
arXiv Detail & Related papers (2022-06-27T17:02:53Z) - Deciding What to Model: Value-Equivalent Sampling for Reinforcement
Learning [21.931580762349096]
We introduce an algorithm that computes an approximately-value-equivalent, lossy compression of the environment which an agent may feasibly target in lieu of the true model.
We prove an information-theoretic, Bayesian regret bound for our algorithm that holds for any finite-horizon, episodic sequential decision-making problem.
arXiv Detail & Related papers (2022-06-04T23:36:38Z) - Between Rate-Distortion Theory & Value Equivalence in Model-Based
Reinforcement Learning [21.931580762349096]
We introduce an algorithm for synthesizing simple and useful approximations of the environment from which an agent might still recover near-optimal behavior.
We recognize the information-theoretic nature of this lossy environment compression problem and use the appropriate tools of rate-distortion theory to make mathematically precise how value equivalence can lend tractability to otherwise intractable sequential decision-making problems.
arXiv Detail & Related papers (2022-06-04T17:09:46Z) - Value Gradient weighted Model-Based Reinforcement Learning [28.366157882991565]
Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies.
VaGraM is a novel method for value-aware model learning.
arXiv Detail & Related papers (2022-04-04T13:28:31Z) - Model-Advantage Optimization for Model-Based Reinforcement Learning [41.13567626667456]
Model-based Reinforcement Learning (MBRL) algorithms have been traditionally designed with the goal of learning accurate dynamics of the environment.
Value-aware model learning, an alternative model-learning paradigm to maximum likelihood, proposes to inform model-learning through the value function of the learnt policy.
We propose a novel value-aware objective that is an upper bound on the absolute performance difference of a policy across two models.
arXiv Detail & Related papers (2021-06-26T20:01:28Z) - Generative Temporal Difference Learning for Infinite-Horizon Prediction [101.59882753763888]
We introduce the $gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon.
We discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors.
arXiv Detail & Related papers (2020-10-27T17:54:12Z) - On the model-based stochastic value gradient for continuous
reinforcement learning [50.085645237597056]
We show that simple model-based agents can outperform state-of-the-art model-free agents in terms of both sample-efficiency and final reward.
Our findings suggest that model-based policy evaluation deserves closer attention.
arXiv Detail & Related papers (2020-08-28T17:58:29Z) - Model Embedding Model-Based Reinforcement Learning [4.566180616886624]
Model-based reinforcement learning (MBRL) has shown its advantages in sample-efficiency over model-free reinforcement learning (MFRL)
Despite the impressive results it achieves, it still faces a trade-off between the ease of data generation and model bias.
We propose a simple and elegant model-embedding model-based reinforcement learning (MEMB) algorithm in the framework of the probabilistic reinforcement learning.
arXiv Detail & Related papers (2020-06-16T15:10:28Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.