Efficient Model-Free Exploration in Low-Rank MDPs
- URL: http://arxiv.org/abs/2307.03997v2
- Date: Thu, 29 Feb 2024 15:40:41 GMT
- Title: Efficient Model-Free Exploration in Low-Rank MDPs
- Authors: Zakaria Mhammedi, Adam Block, Dylan J. Foster, Alexander Rakhlin
- Abstract summary: Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
- Score: 76.87340323826945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A major challenge in reinforcement learning is to develop practical,
sample-efficient algorithms for exploration in high-dimensional domains where
generalization and function approximation is required. Low-Rank Markov Decision
Processes -- where transition probabilities admit a low-rank factorization
based on an unknown feature embedding -- offer a simple, yet expressive
framework for RL with function approximation, but existing algorithms are
either (1) computationally intractable, or (2) reliant upon restrictive
statistical assumptions such as latent variable structure, access to
model-based function approximation, or reachability. In this work, we propose
the first provably sample-efficient algorithm for exploration in Low-Rank MDPs
that is both computationally efficient and model-free, allowing for general
function approximation and requiring no additional structural assumptions. Our
algorithm, VoX, uses the notion of a barycentric spanner for the feature
embedding as an efficiently computable basis for exploration, performing
efficient barycentric spanner computation by interleaving representation
learning and policy optimization. Our analysis -- which is appealingly simple
and modular -- carefully combines several techniques, including a new approach
to error-tolerant barycentric spanner computation and an improved analysis of a
certain minimax representation learning objective found in prior work.
Related papers
- Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning [6.969949986864736]
Distributionally robust offline reinforcement learning (RL) seeks robust policy training against environment perturbation by modeling dynamics uncertainty.
We propose minimax optimal and computationally efficient algorithms realizing function approximation.
Our results uncover that function approximation in robust offline RL is essentially distinct from and probably harder than that in standard offline RL.
arXiv Detail & Related papers (2024-03-14T17:55:10Z) - Provably Efficient Representation Learning with Tractable Planning in
Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs)
We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU)
We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z) - Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise.
In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - A General Framework for Sample-Efficient Function Approximation in
Reinforcement Learning [132.45959478064736]
We propose a general framework that unifies model-based and model-free reinforcement learning.
We propose a novel estimation function with decomposable structural properties for optimization-based exploration.
Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
arXiv Detail & Related papers (2022-09-30T17:59:16Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Dual Optimization for Kolmogorov Model Learning Using Enhanced Gradient
Descent [8.714458129632158]
Kolmogorov model (KM) is an interpretable and predictable representation approach to learning the underlying probabilistic structure of a set of random variables.
We propose a computationally scalable KM learning algorithm, based on the regularized dual optimization combined with enhanced gradient descent (GD) method.
It is shown that the accuracy of logical relation mining for interpretability by using the proposed KM learning algorithm exceeds $80%$.
arXiv Detail & Related papers (2021-07-11T10:33:02Z) - Escaping Poor Local Minima in Large Scale Robust Estimation [41.304283715031204]
We introduce two novel approaches for robust parameter estimation.
The first algorithm uses an adaptive kernel scaling strategy that enjoys a strong ability to escape poor minima.
The second algorithm combines a generalized Majorization Minimization framework with the half-quadratic lifting formulation to obtain a simple yet efficient solver.
arXiv Detail & Related papers (2021-02-22T11:58:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.