FLAMBE: Structural Complexity and Representation Learning of Low Rank
MDPs
- URL: http://arxiv.org/abs/2006.10814v2
- Date: Wed, 22 Jul 2020 16:49:17 GMT
- Title: FLAMBE: Structural Complexity and Representation Learning of Low Rank
MDPs
- Authors: Alekh Agarwal, Sham Kakade, Akshay Krishnamurthy, Wen Sun
- Abstract summary: This work focuses on the representation learning question: how can we learn such features?
Under the assumption that the underlying (unknown) dynamics correspond to a low rank transition matrix, we show how the representation learning question is related to a particular non-linear matrix decomposition problem.
We develop FLAMBE, which engages in exploration and representation learning for provably efficient RL in low rank transition models.
- Score: 53.710405006523274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In order to deal with the curse of dimensionality in reinforcement learning
(RL), it is common practice to make parametric assumptions where values or
policies are functions of some low dimensional feature space. This work focuses
on the representation learning question: how can we learn such features? Under
the assumption that the underlying (unknown) dynamics correspond to a low rank
transition matrix, we show how the representation learning question is related
to a particular non-linear matrix decomposition problem. Structurally, we make
precise connections between these low rank MDPs and latent variable models,
showing how they significantly generalize prior formulations for representation
learning in RL. Algorithmically, we develop FLAMBE, which engages in
exploration and representation learning for provably efficient RL in low rank
transition models.
Related papers
- Locating Information in Large Language Models via Random Matrix Theory [0.0]
We analyze the weight matrices of pretrained transformer models BERT and Llama.
deviations emerge after training, allowing us to locate learned structures within the models.
Our findings reveal that, after fine-tuning, small singular values play a crucial role in the models' capabilities.
arXiv Detail & Related papers (2024-10-23T11:19:08Z) - Reinforcement Learning in Low-Rank MDPs with Density Features [12.932032416729774]
MDPs with low-rank transitions are highly representative structures that enable tractable learning.
We investigate sample-efficient learning with density features, i.e., the right matrix, which induce powerful models for state-occupancy distributions.
arXiv Detail & Related papers (2023-02-04T22:46:28Z) - Categorical semantics of compositional reinforcement learning [25.752647944862183]
Reinforcement learning (RL) often requires decomposing a problem into subtasks and composing learned behaviors on these tasks.
We develop a framework for a emphcompositional theory of RL using a categorical point of view.
We show that $mathsfMDP$ admits natural compositional operations, such as certain fiber products and pushouts.
arXiv Detail & Related papers (2022-08-29T15:51:36Z) - Representation Learning for Online and Offline RL in Low-rank MDPs [36.398511188102205]
We focus on the low-rank Markov Decision Processes (MDPs) where the transition dynamics correspond to a low-rank transition matrix.
For the online setting, operating with the same computational oracles used in FLAMBE, we propose an algorithm REP-UCB Upper Confidence Bound Representation learning for RL.
For the offline RL setting, we develop an algorithm that leverages pessimism to learn under a partial coverage condition.
arXiv Detail & Related papers (2021-10-09T22:04:34Z) - Sample Efficient Reinforcement Learning In Continuous State Spaces: A
Perspective Beyond Linearity [50.38337893712897]
We introduce the Effective Planning Window (EPW) condition, a structural condition on MDPs that makes no linearity assumptions.
We demonstrate that the EPW condition permits sample efficient RL, by providing an algorithm which provably solves MDPs satisfying this condition.
We additionally show the necessity of conditions like EPW, by demonstrating that simple MDPs with slight nonlinearities cannot be solved sample efficiently.
arXiv Detail & Related papers (2021-06-15T00:06:59Z) - Nonparametric Trace Regression in High Dimensions via Sign Series
Representation [13.37650464374017]
We develop a framework for nonparametric trace regression models via structured sign series representations of high dimensional functions.
In the context of matrix completion, our framework leads to a substantially richer model based on what we coin as the "sign rank" of a matrix.
arXiv Detail & Related papers (2021-05-04T22:20:00Z) - Model-free Representation Learning and Exploration in Low-rank MDPs [64.72023662543363]
We present the first model-free representation learning algorithms for low rank MDPs.
Key algorithmic contribution is a new minimax representation learning objective.
Result can accommodate general function approximation to scale to complex environments.
arXiv Detail & Related papers (2021-02-14T00:06:54Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method.
PCL implicitly encodes semantic structures of the data into the learned embedding space.
PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z) - Plannable Approximations to MDP Homomorphisms: Equivariance under
Actions [72.30921397899684]
We introduce a contrastive loss function that enforces action equivariance on the learned representations.
We prove that when our loss is zero, we have a homomorphism of a deterministic Markov Decision Process.
We show experimentally that for deterministic MDPs, the optimal policy in the abstract MDP can be successfully lifted to the original MDP.
arXiv Detail & Related papers (2020-02-27T08:29:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.