A General Framework for Sample-Efficient Function Approximation in
Reinforcement Learning
- URL: http://arxiv.org/abs/2209.15634v1
- Date: Fri, 30 Sep 2022 17:59:16 GMT
- Title: A General Framework for Sample-Efficient Function Approximation in
Reinforcement Learning
- Authors: Zixiang Chen, Chris Junchi Li, Angela Yuan, Quanquan Gu, Michael I.
Jordan
- Abstract summary: We propose a general framework that unifies model-based and model-free reinforcement learning.
We propose a novel estimation function with decomposable structural properties for optimization-based exploration.
Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
- Score: 132.45959478064736
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing need for handling large state and action spaces, general
function approximation has become a key technique in reinforcement learning
(RL). In this paper, we propose a general framework that unifies model-based
and model-free RL, and an Admissible Bellman Characterization (ABC) class that
subsumes nearly all Markov Decision Process (MDP) models in the literature for
tractable RL. We propose a novel estimation function with decomposable
structural properties for optimization-based exploration and the functional
eluder dimension as a complexity measure of the ABC class. Under our framework,
a new sample-efficient algorithm namely OPtimization-based ExploRation with
Approximation (OPERA) is proposed, achieving regret bounds that match or
improve over the best-known results for a variety of MDP models. In particular,
for MDPs with low Witness rank, under a slightly stronger assumption, OPERA
improves the state-of-the-art sample complexity results by a factor of $dH$.
Our framework provides a generic interface to design and analyze new RL models
and algorithms.
Related papers
- Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL [57.745700271150454]
We study the sample complexity of reinforcement learning in Mean-Field Games (MFGs) with model-based function approximation.
We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity.
arXiv Detail & Related papers (2024-02-08T14:54:47Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Model-free Representation Learning and Exploration in Low-rank MDPs [64.72023662543363]
We present the first model-free representation learning algorithms for low rank MDPs.
Key algorithmic contribution is a new minimax representation learning objective.
Result can accommodate general function approximation to scale to complex environments.
arXiv Detail & Related papers (2021-02-14T00:06:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.