Bilinear Classes: A Structural Framework for Provable Generalization in
RL
- URL: http://arxiv.org/abs/2103.10897v1
- Date: Fri, 19 Mar 2021 16:34:20 GMT
- Title: Bilinear Classes: A Structural Framework for Provable Generalization in
RL
- Authors: Simon S. Du, Sham M. Kakade, Jason D. Lee, Shachar Lovett, Gaurav
Mahajan, Wen Sun and Ruosong Wang
- Abstract summary: Bilinear Classes is a new structural framework which permits generalization in reinforcement learning.
The framework incorporates nearly all existing models in which a sample complexity is achievable.
Our main result provides an RL algorithm which has sample complexity for Bilinear Classes.
- Score: 119.42509700822484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work introduces Bilinear Classes, a new structural framework, which
permit generalization in reinforcement learning in a wide variety of settings
through the use of function approximation. The framework incorporates nearly
all existing models in which a polynomial sample complexity is achievable, and,
notably, also includes new models, such as the Linear $Q^*/V^*$ model in which
both the optimal $Q$-function and the optimal $V$-function are linear in some
known feature space. Our main result provides an RL algorithm which has
polynomial sample complexity for Bilinear Classes; notably, this sample
complexity is stated in terms of a reduction to the generalization error of an
underlying supervised learning sub-problem. These bounds nearly match the best
known sample complexity bounds for existing models. Furthermore, this framework
also extends to the infinite dimensional (RKHS) setting: for the the Linear
$Q^*/V^*$ model, linear MDPs, and linear mixture MDPs, we provide sample
complexities that have no explicit dependence on the explicit feature dimension
(which could be infinite), but instead depends only on information theoretic
quantities.
Related papers
- Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable.
CMDPs serve as an important framework to model many real-world applications with time-varying environments.
We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z) - On Representation Complexity of Model-based and Model-free Reinforcement
Learning [11.843778337443824]
We study the representation complexity of model-based and model-free reinforcement learning (RL) in the context of circuit complexity.
We prove theoretically that there exists a broad class of MDPs such that their underlying transition and reward functions can be represented by constant depth circuits with size.
By drawing attention to the approximation errors and building connections to complexity theory, our theory provides unique insights into why model-based algorithms usually enjoy better sample complexity than model-free algorithms from a novel representation complexity perspective.
arXiv Detail & Related papers (2023-10-03T00:01:58Z) - Sample Complexity of Kernel-Based Q-Learning [11.32718794195643]
We propose a non-parametric Q-learning algorithm which finds an $epsilon$-optimal policy in an arbitrarily large scale discounted MDP.
To the best of our knowledge, this is the first result showing a finite sample complexity under such a general model.
arXiv Detail & Related papers (2023-02-01T19:46:25Z) - A General Framework for Sample-Efficient Function Approximation in
Reinforcement Learning [132.45959478064736]
We propose a general framework that unifies model-based and model-free reinforcement learning.
We propose a novel estimation function with decomposable structural properties for optimization-based exploration.
Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
arXiv Detail & Related papers (2022-09-30T17:59:16Z) - PAC Reinforcement Learning for Predictive State Representations [60.00237613646686]
We study online Reinforcement Learning (RL) in partially observable dynamical systems.
We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models.
We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
arXiv Detail & Related papers (2022-07-12T17:57:17Z) - Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-efficiency of Posterior Sampling [28.371541697552928]
We present the first result for non-linear function approximation which holds for general action spaces under a linear embeddability condition.
We show worst case sample complexity guarantees that scale with a rank parameter of the RL problem.
arXiv Detail & Related papers (2022-03-15T20:50:26Z) - Sample-Efficient Reinforcement Learning Is Feasible for Linearly
Realizable MDPs with Limited Revisiting [60.98700344526674]
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning.
In this paper, we investigate a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner.
We develop an algorithm tailored to this setting, achieving a sample complexity that scales practicallyly with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space.
arXiv Detail & Related papers (2021-05-17T17:22:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.