Making Linear MDPs Practical via Contrastive Representation Learning
- URL: http://arxiv.org/abs/2207.07150v1
- Date: Thu, 14 Jul 2022 18:18:02 GMT
- Title: Making Linear MDPs Practical via Contrastive Representation Learning
- Authors: Tianjun Zhang, Tongzheng Ren, Mengjiao Yang, Joseph E. Gonzalez, Dale
Schuurmans, Bo Dai
- Abstract summary: It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations.
We consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning.
We demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.
- Score: 101.75885788118131
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is common to address the curse of dimensionality in Markov decision
processes (MDPs) by exploiting low-rank representations. This motivates much of
the recent theoretical study on linear MDPs. However, most approaches require a
given representation under unrealistic assumptions about the normalization of
the decomposition or introduce unresolved computational challenges in practice.
Instead, we consider an alternative definition of linear MDPs that
automatically ensures normalization while allowing efficient representation
learning via contrastive estimation. The framework also admits
confidence-adjusted index algorithms, enabling an efficient and principled
approach to incorporating optimism or pessimism in the face of uncertainty. To
the best of our knowledge, this provides the first practical representation
learning method for linear MDPs that achieves both strong theoretical
guarantees and empirical performance. Theoretically, we prove that the proposed
algorithm is sample efficient in both the online and offline settings.
Empirically, we demonstrate superior performance over existing state-of-the-art
model-based and model-free algorithms on several benchmarks.
Related papers
- BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning [78.63421517563056]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.
We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model.
We introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps.
arXiv Detail & Related papers (2025-01-31T02:39:07Z) - Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models [56.92178753201331]
We tackle average-reward infinite-horizon POMDPs with an unknown transition model.
We present a novel and simple estimator that overcomes this barrier.
arXiv Detail & Related papers (2025-01-30T22:29:41Z) - Online MDP with Transition Prototypes: A Robust Adaptive Approach [8.556972018137147]
We consider an online robust Markov Decision Process (MDP) where we have the information of finitely many prototypes of the underlying transition kernel.
We propose an algorithm that efficiently identifies the true underlying transition kernel while guaranteeing the performance of the corresponding robust policy.
arXiv Detail & Related papers (2024-12-18T17:19:55Z) - Efficient Fairness-Performance Pareto Front Computation [51.558848491038916]
We show that optimal fair representations possess several useful structural properties.
We then show that these approxing problems can be solved efficiently via concave programming methods.
arXiv Detail & Related papers (2024-09-26T08:46:48Z) - Stochastic convex optimization for provably efficient apprenticeship
learning [1.0609815608017066]
We consider large-scale Markov decision processes (MDPs) with an unknown cost function.
We employ convex optimization tools to address the problem of imitation learning, which consists of learning a policy from a finite set of expert demonstrations.
arXiv Detail & Related papers (2021-12-31T19:47:57Z) - Efficient Performance Bounds for Primal-Dual Reinforcement Learning from
Demonstrations [1.0609815608017066]
We consider large-scale Markov decision processes with an unknown cost function and address the problem of learning a policy from a finite set of expert demonstrations.
Existing inverse reinforcement learning methods come with strong theoretical guarantees, but are computationally expensive.
We introduce a novel bilinear saddle-point framework using Lagrangian duality to bridge the gap between theory and practice.
arXiv Detail & Related papers (2021-12-28T05:47:24Z) - How Fine-Tuning Allows for Effective Meta-Learning [50.17896588738377]
We present a theoretical framework for analyzing representations derived from a MAML-like algorithm.
We provide risk bounds on the best predictor found by fine-tuning via gradient descent, demonstrating that the algorithm can provably leverage the shared structure.
This separation result underscores the benefit of fine-tuning-based methods, such as MAML, over methods with "frozen representation" objectives in few-shot learning.
arXiv Detail & Related papers (2021-05-05T17:56:00Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.