GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond
- URL: http://arxiv.org/abs/2211.01962v4
- Date: Fri, 30 Jun 2023 13:05:42 GMT
- Title: GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond
- Authors: Han Zhong, Wei Xiong, Sirui Zheng, Liwei Wang, Zhaoran Wang, Zhuoran
Yang, Tong Zhang
- Abstract summary: We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
- Score: 101.5329678997916
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study sample efficient reinforcement learning (RL) under the general
framework of interactive decision making, which includes Markov decision
process (MDP), partially observable Markov decision process (POMDP), and
predictive state representation (PSR) as special cases. Toward finding the
minimum assumption that empowers sample efficient learning, we propose a novel
complexity measure, generalized eluder coefficient (GEC), which characterizes
the fundamental tradeoff between exploration and exploitation in online
interactive decision making. In specific, GEC captures the hardness of
exploration by comparing the error of predicting the performance of the updated
policy with the in-sample training error evaluated on the historical data. We
show that RL problems with low GEC form a remarkably rich class, which subsumes
low Bellman eluder dimension problems, bilinear class, low witness rank
problems, PO-bilinear class, and generalized regular PSR, where generalized
regular PSR, a new tractable PSR class identified by us, includes nearly all
known tractable POMDPs and PSRs. Furthermore, in terms of algorithm design, we
propose a generic posterior sampling algorithm, which can be implemented in
both model-free and model-based fashion, under both fully observable and
partially observable settings. The proposed algorithm modifies the standard
posterior sampling algorithm in two aspects: (i) we use an optimistic prior
distribution that biases towards hypotheses with higher values and (ii) a
loglikelihood function is set to be the empirical loss evaluated on the
historical data, where the choice of loss function supports both model-free and
model-based learning. We prove that the proposed algorithm is sample efficient
by establishing a sublinear regret upper bound in terms of GEC. In summary, we
provide a new and unified understanding of both fully observable and partially
observable RL.
Related papers
- Overcoming the Curse of Dimensionality in Reinforcement Learning Through Approximate Factorization [15.898378661128334]
Reinforcement Learning (RL) algorithms are known to suffer from the curse of dimensionality.
We propose overcoming the curse of dimensionality by approximately factorizing the original Markov decision processes (MDPs) into smaller, independently evolving MDPs.
We provide improved sample complexity guarantees for both proposed algorithms.
arXiv Detail & Related papers (2024-11-12T07:08:00Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - A General Framework for Sample-Efficient Function Approximation in
Reinforcement Learning [132.45959478064736]
We propose a general framework that unifies model-based and model-free reinforcement learning.
We propose a novel estimation function with decomposable structural properties for optimization-based exploration.
Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
arXiv Detail & Related papers (2022-09-30T17:59:16Z) - Optimistic MLE -- A Generic Model-based Algorithm for Partially
Observable Sequential Decision Making [48.87943416098096]
This paper introduces a simple efficient learning algorithms for general sequential decision making.
We prove that OMLE learns near-optimal policies of an enormously rich class of sequential decision making problems.
arXiv Detail & Related papers (2022-09-29T17:56:25Z) - PAC Reinforcement Learning for Predictive State Representations [60.00237613646686]
We study online Reinforcement Learning (RL) in partially observable dynamical systems.
We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models.
We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
arXiv Detail & Related papers (2022-07-12T17:57:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.