Related papers: Efficient Model-Free Exploration in Low-Rank MDPs

Efficient Model-Free Exploration in Low-Rank MDPs

URL: http://arxiv.org/abs/2307.03997v2
Date: Thu, 29 Feb 2024 15:40:41 GMT
Title: Efficient Model-Free Exploration in Low-Rank MDPs
Authors: Zakaria Mhammedi, Adam Block, Dylan J. Foster, Alexander Rakhlin
Abstract summary: Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation. Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions. We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
Score: 76.87340323826945
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A major challenge in reinforcement learning is to develop practical, sample-efficient algorithms for exploration in high-dimensional domains where generalization and function approximation is required. Low-Rank Markov Decision Processes -- where transition probabilities admit a low-rank factorization based on an unknown feature embedding -- offer a simple, yet expressive framework for RL with function approximation, but existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions such as latent variable structure, access to model-based function approximation, or reachability. In this work, we propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs that is both computationally efficient and model-free, allowing for general function approximation and requiring no additional structural assumptions. Our algorithm, VoX, uses the notion of a barycentric spanner for the feature embedding as an efficiently computable basis for exploration, performing efficient barycentric spanner computation by interleaving representation learning and policy optimization. Our analysis -- which is appealingly simple and modular -- carefully combines several techniques, including a new approach to error-tolerant barycentric spanner computation and an improved analysis of a certain minimax representation learning objective found in prior work.

Related papers

Is a Good Foundation Necessary for Efficient Reinforcement Learning? The Computational Role of the Base Model in Exploration [32.77845864484552]
We introduce a new computational framework for RL with language models, in which the learner interacts with the model through a sampling oracle. We show that coverage, while not necessary for data efficiency, lower bounds the runtime of any algorithm in our framework. We introduce a new algorithm, SpannerSampling, which obtains optimal data efficiency and is computationally efficient whenever the pre-trained model enjoys sufficient coverage.
arXiv Detail & Related papers (2025-03-10T15:31:42Z)
Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning [6.969949986864736]
Distributionally robust offline reinforcement learning (RL) seeks robust policy training against environment perturbation by modeling dynamics uncertainty. We propose minimax optimal and computationally efficient algorithms realizing function approximation. Our results uncover that function approximation in robust offline RL is essentially distinct from and probably harder than that in standard offline RL.
arXiv Detail & Related papers (2024-03-14T17:55:10Z)
Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs) We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU) We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z)
Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise. In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z)
Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity. We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level. Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z)
A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning [132.45959478064736]
We propose a general framework that unifies model-based and model-free reinforcement learning. We propose a novel estimation function with decomposable structural properties for optimization-based exploration. Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
arXiv Detail & Related papers (2022-09-30T17:59:16Z)
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation. We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z)
Dual Optimization for Kolmogorov Model Learning Using Enhanced Gradient Descent [8.714458129632158]
Kolmogorov model (KM) is an interpretable and predictable representation approach to learning the underlying probabilistic structure of a set of random variables. We propose a computationally scalable KM learning algorithm, based on the regularized dual optimization combined with enhanced gradient descent (GD) method. It is shown that the accuracy of logical relation mining for interpretability by using the proposed KM learning algorithm exceeds $80%$.
arXiv Detail & Related papers (2021-07-11T10:33:02Z)
Escaping Poor Local Minima in Large Scale Robust Estimation [41.304283715031204]
We introduce two novel approaches for robust parameter estimation. The first algorithm uses an adaptive kernel scaling strategy that enjoys a strong ability to escape poor minima. The second algorithm combines a generalized Majorization Minimization framework with the half-quadratic lifting formulation to obtain a simple yet efficient solver.
arXiv Detail & Related papers (2021-02-22T11:58:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.