A Free Lunch from the Noise: Provable and Practical Exploration for
Representation Learning
- URL: http://arxiv.org/abs/2111.11485v1
- Date: Mon, 22 Nov 2021 19:24:57 GMT
- Title: A Free Lunch from the Noise: Provable and Practical Exploration for
Representation Learning
- Authors: Tongzheng Ren, Tianjun Zhang, Csaba Szepesv\'ari, Bo Dai
- Abstract summary: We show that under some noise assumption, we can obtain the linear spectral feature of its corresponding Markov transition operator in closed-form for free.
We propose Spectral Dynamics Embedding (SPEDE), which breaks the trade-off and completes optimistic exploration for representation learning by exploiting the structure of the noise.
- Score: 55.048010996144036
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Representation learning lies at the heart of the empirical success of deep
learning for dealing with the curse of dimensionality. However, the power of
representation learning has not been fully exploited yet in reinforcement
learning (RL), due to i), the trade-off between expressiveness and
tractability; and ii), the coupling between exploration and representation
learning. In this paper, we first reveal the fact that under some noise
assumption in the stochastic control model, we can obtain the linear spectral
feature of its corresponding Markov transition operator in closed-form for
free. Based on this observation, we propose Spectral Dynamics Embedding
(SPEDE), which breaks the trade-off and completes optimistic exploration for
representation learning by exploiting the structure of the noise. We provide
rigorous theoretical analysis of SPEDE, and demonstrate the practical superior
performance over the existing state-of-the-art empirical algorithms on several
benchmarks.
Related papers
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Adversarial Imitation Learning from Visual Observations using Latent Information [9.240917262195046]
We focus on the problem of imitation learning from visual observations, where the learning agent has access to videos of experts as its sole learning source.
We introduce an algorithm called Latent Adversarial from Observations, which combines off-policy adversarial imitation techniques with a learned latent representation of the agent's state from sequences of observations.
In experiments on high-dimensional continuous robotic tasks, we show that our model-free approach in latent space matches state-of-the-art performance.
arXiv Detail & Related papers (2023-09-29T16:20:36Z) - Spectal Harmonics: Bridging Spectral Embedding and Matrix Completion in
Self-Supervised Learning [6.5151694672131875]
Self-supervised methods received tremendous attention thanks to their seemingly approach to learning representations that respect the semantics of the data without any apparent supervision in the form of labels.
A growing body of literature is already being published in an attempt to build a coherent and theoretically grounded understanding of the workings of a zoo of losses used in modern self-supervised representation learning methods.
arXiv Detail & Related papers (2023-05-31T13:02:06Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - Understanding Self-Predictive Learning for Reinforcement Learning [61.62067048348786]
We study the learning dynamics of self-predictive learning for reinforcement learning.
We propose a novel self-predictive algorithm that learns two representations simultaneously.
arXiv Detail & Related papers (2022-12-06T20:43:37Z) - Task-Free Continual Learning via Online Discrepancy Distance Learning [11.540150938141034]
This paper develops a new theoretical analysis framework which provides generalization bounds based on the discrepancy distance between the visited samples and the entire information made available for training the model.
Inspired by this theoretical model, we propose a new approach enabled by the dynamic component expansion mechanism for a mixture model, namely the Online Discrepancy Distance Learning (ODDL)
arXiv Detail & Related papers (2022-10-12T20:44:09Z) - Spectral Decomposition Representation for Reinforcement Learning [100.0424588013549]
We propose an alternative spectral method, Spectral Decomposition Representation (SPEDER), that extracts a state-action abstraction from the dynamics without inducing spurious dependence on the data collection policy.
A theoretical analysis establishes the sample efficiency of the proposed algorithm in both the online and offline settings.
An experimental investigation demonstrates superior performance over current state-of-the-art algorithms across several benchmarks.
arXiv Detail & Related papers (2022-08-19T19:01:30Z) - Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces.
We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z) - Reinforced Imitation Learning by Free Energy Principle [2.9327503320877457]
Reinforcement Learning (RL) requires a large amount of exploration especially in sparse-reward settings.
Imitation Learning (IL) can learn from expert demonstrations without exploration.
We radically unify RL and IL based on Free Energy Principle (FEP)
arXiv Detail & Related papers (2021-07-25T14:19:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.