Bootstrapped Representations in Reinforcement Learning
- URL: http://arxiv.org/abs/2306.10171v1
- Date: Fri, 16 Jun 2023 20:14:07 GMT
- Title: Bootstrapped Representations in Reinforcement Learning
- Authors: Charline Le Lan, Stephen Tu, Mark Rowland, Anna Harutyunyan, Rishabh
Agarwal, Marc G. Bellemare, Will Dabney
- Abstract summary: In reinforcement learning (RL), state representations are key to dealing with large or continuous state spaces.
We provide a theoretical characterization of the state representation learnt by temporal difference learning.
We describe the efficacy of these representations for policy evaluation, and use our theoretical analysis to design new auxiliary learning rules.
- Score: 44.49675960752777
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In reinforcement learning (RL), state representations are key to dealing with
large or continuous state spaces. While one of the promises of deep learning
algorithms is to automatically construct features well-tuned for the task they
try to solve, such a representation might not emerge from end-to-end training
of deep RL agents. To mitigate this issue, auxiliary objectives are often
incorporated into the learning process and help shape the learnt state
representation. Bootstrapping methods are today's method of choice to make
these additional predictions. Yet, it is unclear which features these
algorithms capture and how they relate to those from other auxiliary-task-based
approaches. In this paper, we address this gap and provide a theoretical
characterization of the state representation learnt by temporal difference
learning (Sutton, 1988). Surprisingly, we find that this representation differs
from the features learned by Monte Carlo and residual gradient algorithms for
most transition structures of the environment in the policy evaluation setting.
We describe the efficacy of these representations for policy evaluation, and
use our theoretical analysis to design new auxiliary learning rules. We
complement our theoretical results with an empirical comparison of these
learning rules for different cumulant functions on classic domains such as the
four-room domain (Sutton et al, 1999) and Mountain Car (Moore, 1990).
Related papers
- Bridging State and History Representations: Understanding Self-Predictive RL [24.772140132462468]
Representations are at the core of all deep reinforcement learning (RL) methods for Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs)
We show that many of these seemingly distinct methods and frameworks for state and history abstractions are, in fact, based on a common idea of self-predictive abstraction.
We provide theoretical insights into the widely adopted objectives and optimization, such as the stop-gradient technique, in learning self-predictive representations.
arXiv Detail & Related papers (2024-01-17T00:47:43Z) - Large Language Models can Implement Policy Iteration [18.424558160071808]
In-Context Policy Iteration is an algorithm for performing Reinforcement Learning (RL), in-context, using foundation models.
ICPI learns to perform RL tasks without expert demonstrations or gradients.
ICPI iteratively updates the contents of the prompt from which it derives its policy through trial-and-error interaction with an RL environment.
arXiv Detail & Related papers (2022-10-07T21:18:22Z) - Spectral Decomposition Representation for Reinforcement Learning [100.0424588013549]
We propose an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without inducing spurious dependence on the data collection policy.
A theoretical analysis establishes the sample efficiency of the proposed algorithm in both the online and offline settings.
An experimental investigation demonstrates superior performance over current state-of-the-art algorithms across several benchmarks.
arXiv Detail & Related papers (2022-08-19T19:01:30Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - State Representation Learning for Goal-Conditioned Reinforcement
Learning [9.162936410696407]
This paper presents a novel state representation for reward-free Markov decision processes.
The idea is to learn, in a self-supervised manner, an embedding space where between pairs of embedded states correspond to the minimum number of actions needed to transition between them.
We show how this representation can be leveraged to learn goal-conditioned policies.
arXiv Detail & Related papers (2022-05-04T09:20:09Z) - Empirical Evaluation and Theoretical Analysis for Representation
Learning: A Survey [25.5633960013493]
representation learning enables us to automatically extract generic feature representations from a dataset to solve another machine learning task.
Recently, extracted feature representations by a representation learning algorithm and a simple predictor have exhibited state-of-the-art performance on several machine learning tasks.
arXiv Detail & Related papers (2022-04-18T09:18:47Z) - How Fine-Tuning Allows for Effective Meta-Learning [50.17896588738377]
We present a theoretical framework for analyzing representations derived from a MAML-like algorithm.
We provide risk bounds on the best predictor found by fine-tuning via gradient descent, demonstrating that the algorithm can provably leverage the shared structure.
This separation result underscores the benefit of fine-tuning-based methods, such as MAML, over methods with "frozen representation" objectives in few-shot learning.
arXiv Detail & Related papers (2021-05-05T17:56:00Z) - Metrics and continuity in reinforcement learning [34.10996560464196]
We introduce a unified formalism for defining topologies through the lens of metrics.
We establish a hierarchy amongst these metrics and demonstrate their theoretical implications on the Markov Decision Process.
We complement our theoretical results with empirical evaluations showcasing the differences between the metrics considered.
arXiv Detail & Related papers (2021-02-02T14:30:41Z) - Reinforcement Learning as Iterative and Amortised Inference [62.997667081978825]
We use the control as inference framework to outline a novel classification scheme based on amortised and iterative inference.
We show that taking this perspective allows us to identify parts of the algorithmic design space which have been relatively unexplored.
arXiv Detail & Related papers (2020-06-13T16:10:03Z) - Hierarchical Variational Imitation Learning of Control Programs [131.7671843857375]
We propose a variational inference method for imitation learning of a control policy represented by parametrized hierarchical procedures (PHP)
Our method discovers the hierarchical structure in a dataset of observation-action traces of teacher demonstrations, by learning an approximate posterior distribution over the latent sequence of procedure calls and terminations.
We demonstrate a novel benefit of variational inference in the context of hierarchical imitation learning: in decomposing the policy into simpler procedures, inference can leverage acausal information that is unused by other methods.
arXiv Detail & Related papers (2019-12-29T08:57:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.