Making Linear MDPs Practical via Contrastive Representation Learning
- URL: http://arxiv.org/abs/2207.07150v1
- Date: Thu, 14 Jul 2022 18:18:02 GMT
- Title: Making Linear MDPs Practical via Contrastive Representation Learning
- Authors: Tianjun Zhang, Tongzheng Ren, Mengjiao Yang, Joseph E. Gonzalez, Dale
Schuurmans, Bo Dai
- Abstract summary: It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
- Score: 101.75885788118131
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is common to address the curse of dimensionality in Markov decision
processes (MDPs) by exploiting low-rank representations. This motivates much of
the recent theoretical study on linear MDPs. However, most approaches require a
given representation under unrealistic assumptions about the normalization of
the decomposition or introduce unresolved computational challenges in practice.
Instead, we consider an alternative definition of linear MDPs that
automatically ensures normalization while allowing efficient representation
learning via contrastive estimation. The framework also admits
confidence-adjusted index algorithms, enabling an efficient and principled
approach to incorporating optimism or pessimism in the face of uncertainty. To
the best of our knowledge, this provides the first practical representation
learning method for linear MDPs that achieves both strong theoretical
guarantees and empirical performance. Theoretically, we prove that the proposed
algorithm is sample efficient in both the online and offline settings.
Empirically, we demonstrate superior performance over existing state-of-the-art
model-based and model-free algorithms on several benchmarks.
Related papers
- Efficient Fairness-Performance Pareto Front Computation [51.558848491038916]
We show that optimal fair representations possess several useful structural properties.
We then show that these approxing problems can be solved efficiently via concave programming methods.
arXiv Detail & Related papers (2024-09-26T08:46:48Z) - Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement
Learning in Discounted Linear MDPs [16.006893624836554]
We propose to solve linear MDPs through the lens of Value-Biased Maximum Likelihood Estimation (VBMLE)
VBMLE is computationally more efficient as it only requires solving one optimization problem in each time step.
In our regret analysis, we offer a generic convergence result of MLE in linear MDPs through a novel supermartingale construct.
arXiv Detail & Related papers (2023-10-17T18:27:27Z) - Stochastic convex optimization for provably efficient apprenticeship
learning [1.0609815608017066]
We consider large-scale Markov decision processes (MDPs) with an unknown cost function.
We employ convex optimization tools to address the problem of imitation learning, which consists of learning a policy from a finite set of expert demonstrations.
arXiv Detail & Related papers (2021-12-31T19:47:57Z) - Efficient Performance Bounds for Primal-Dual Reinforcement Learning from
Demonstrations [1.0609815608017066]
We consider large-scale Markov decision processes with an unknown cost function and address the problem of learning a policy from a finite set of expert demonstrations.
Existing inverse reinforcement learning methods come with strong theoretical guarantees, but are computationally expensive.
We introduce a novel bilinear saddle-point framework using Lagrangian duality to bridge the gap between theory and practice.
arXiv Detail & Related papers (2021-12-28T05:47:24Z) - False Correlation Reduction for Offline Reinforcement Learning [115.11954432080749]
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm.
We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL)
arXiv Detail & Related papers (2021-10-24T15:34:03Z) - Sample Efficient Reinforcement Learning In Continuous State Spaces: A
Perspective Beyond Linearity [50.38337893712897]
We introduce the Effective Planning Window (EPW) condition, a structural condition on MDPs that makes no linearity assumptions.
We demonstrate that the EPW condition permits sample efficient RL, by providing an algorithm which provably solves MDPs satisfying this condition.
We additionally show the necessity of conditions like EPW, by demonstrating that simple MDPs with slight nonlinearities cannot be solved sample efficiently.
arXiv Detail & Related papers (2021-06-15T00:06:59Z) - Efficient Iterative Amortized Inference for Learning Symmetric and
Disentangled Multi-Object Representations [8.163697683448811]
We introduce EfficientMORL, an efficient framework for the unsupervised learning of object-centric representations.
We show that optimization challenges caused by requiring both symmetry and disentanglement can be addressed by high-cost iterative amortized inference.
We demonstrate strong object decomposition and disentanglement on the standard multi-object benchmark while achieving nearly an order of magnitude faster training and test time inference.
arXiv Detail & Related papers (2021-06-07T14:02:49Z) - How Fine-Tuning Allows for Effective Meta-Learning [50.17896588738377]
We present a theoretical framework for analyzing representations derived from a MAML-like algorithm.
We provide risk bounds on the best predictor found by fine-tuning via gradient descent, demonstrating that the algorithm can provably leverage the shared structure.
This separation result underscores the benefit of fine-tuning-based methods, such as MAML, over methods with "frozen representation" objectives in few-shot learning.
arXiv Detail & Related papers (2021-05-05T17:56:00Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.