Value-driven Hindsight Modelling
- URL: http://arxiv.org/abs/2002.08329v2
- Date: Tue, 20 Oct 2020 20:18:12 GMT
- Title: Value-driven Hindsight Modelling
- Authors: Arthur Guez, Fabio Viola, Th\'eophane Weber, Lars Buesing, Steven
Kapturowski, Doina Precup, David Silver, Nicolas Heess
- Abstract summary: Value estimation is a critical component of the reinforcement learning (RL) paradigm.
Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function.
We develop an approach for representation learning in RL that sits in between these two extremes.
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
- Score: 68.658900923595
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Value estimation is a critical component of the reinforcement learning (RL)
paradigm. The question of how to effectively learn value predictors from data
is one of the major problems studied by the RL community, and different
approaches exploit structure in the problem domain in different ways. Model
learning can make use of the rich transition structure present in sequences of
observations, but this approach is usually not sensitive to the reward
function. In contrast, model-free methods directly leverage the quantity of
interest from the future, but receive a potentially weak scalar signal (an
estimate of the return). We develop an approach for representation learning in
RL that sits in between these two extremes: we propose to learn what to model
in a way that can directly help value prediction. To this end, we determine
which features of the future trajectory provide useful information to predict
the associated return. This provides tractable prediction targets that are
directly relevant for a task, and can thus accelerate learning the value
function. The idea can be understood as reasoning, in hindsight, about which
aspects of the future observations could help past value prediction. We show
how this can help dramatically even in simple policy evaluation settings. We
then test our approach at scale in challenging domains, including on 57 Atari
2600 games.
Related papers
- An Information Theoretic Approach to Machine Unlearning [45.600917449314444]
Key challenge in unlearning is forgetting the necessary data in a timely manner, while preserving model performance.
In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data given only a trained model and the data to be forgotten.
We derive a simple but principled zero-shot unlearning method based on the geometry of the model.
arXiv Detail & Related papers (2024-02-02T13:33:30Z) - Reinforcement Learning from Passive Data via Latent Intentions [86.4969514480008]
We show that passive data can still be used to learn features that accelerate downstream RL.
Our approach learns from passive data by modeling intentions.
Our experiments demonstrate the ability to learn from many forms of passive data, including cross-embodiment video data and YouTube videos.
arXiv Detail & Related papers (2023-04-10T17:59:05Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - You Mostly Walk Alone: Analyzing Feature Attribution in Trajectory
Prediction [52.442129609979794]
Recent deep learning approaches for trajectory prediction show promising performance.
It remains unclear which features such black-box models actually learn to use for making predictions.
This paper proposes a procedure that quantifies the contributions of different cues to model performance.
arXiv Detail & Related papers (2021-10-11T14:24:15Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - Foresee then Evaluate: Decomposing Value Estimation with Latent Future
Prediction [37.06232589005015]
Value function is the central notion of Reinforcement Learning (RL)
We propose Value Decomposition with Future Prediction (VDFP)
We analytically decompose the value function into a latent future dynamics part and a policy-independent trajectory return part, inducing a way to model latent dynamics and returns separately in value estimation.
arXiv Detail & Related papers (2021-03-03T07:28:56Z) - A framework for predicting, interpreting, and improving Learning
Outcomes [0.0]
We develop an Embibe Score Quotient model (ESQ) to predict test scores based on observed academic, behavioral and test-taking features of a student.
ESQ can be used to predict the future scoring potential of a student as well as offer personalized learning nudges.
arXiv Detail & Related papers (2020-10-06T11:22:27Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - The Value-Improvement Path: Towards Better Representations for
Reinforcement Learning [46.70945548475075]
We argue that the value prediction problems faced by an RL agent should not be addressed in isolation, but as a single, holistic, prediction problem.
An RL algorithm generates a sequence of policies that, at least approximately, improve towards the optimal policy.
We demonstrate that a representation that spans the past value-improvement path will also provide an accurate value approximation for future policy improvements.
arXiv Detail & Related papers (2020-06-03T12:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.