Related papers: Sample Complexity of Kernel-Based Q-Learning

Sample Complexity of Kernel-Based Q-Learning

URL: http://arxiv.org/abs/2302.00727v1
Date: Wed, 1 Feb 2023 19:46:25 GMT
Title: Sample Complexity of Kernel-Based Q-Learning
Authors: Sing-Yuan Yeh, Fu-Chieh Chang, Chang-Wei Yueh, Pei-Yuan Wu, Alberto Bernacchia, Sattar Vakili
Abstract summary: We propose a non-parametric Q-learning algorithm which finds an $epsilon$-optimal policy in an arbitrarily large scale discounted MDP. To the best of our knowledge, this is the first result showing a finite sample complexity under such a general model.
Score: 11.32718794195643
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern reinforcement learning (RL) often faces an enormous state-action space. Existing analytical results are typically for settings with a small number of state-actions, or simple models such as linearly modeled Q-functions. To derive statistically efficient RL policies handling large state-action spaces, with more general Q-functions, some recent works have considered nonlinear function approximation using kernel ridge regression. In this work, we derive sample complexities for kernel based Q-learning when a generative model exists. We propose a nonparametric Q-learning algorithm which finds an $\epsilon$-optimal policy in an arbitrarily large scale discounted MDP. The sample complexity of the proposed algorithm is order optimal with respect to $\epsilon$ and the complexity of the kernel (in terms of its information gain). To the best of our knowledge, this is the first result showing a finite sample complexity under such a general model.

Related papers

Near-Optimal Sample Complexity in Reward-Free Kernel-Based Reinforcement Learning [17.508280208015943]
We ask how many samples are required to design a near-optimal policy in kernel-based RL. Existing work addresses this question under restrictive assumptions about the class of kernel functions. We tackle this fundamental problem using a broad class of kernels and a simpler algorithm compared to prior work.
arXiv Detail & Related papers (2025-02-11T17:15:55Z)
Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure. We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z)
On Representation Complexity of Model-based and Model-free Reinforcement Learning [11.843778337443824]
We study the representation complexity of model-based and model-free reinforcement learning (RL) in the context of circuit complexity. We prove theoretically that there exists a broad class of MDPs such that their underlying transition and reward functions can be represented by constant depth circuits with size. By drawing attention to the approximation errors and building connections to complexity theory, our theory provides unique insights into why model-based algorithms usually enjoy better sample complexity than model-free algorithms from a novel representation complexity perspective.
arXiv Detail & Related papers (2023-10-03T00:01:58Z)
Sample-efficient Model-based Reinforcement Learning for Quantum Control [0.2999888908665658]
We propose a model-based reinforcement learning (RL) approach for noisy time-dependent gate optimization. We show an order of magnitude advantage in the sample complexity of our method over standard model-free RL. Our algorithm is well suited for controlling partially characterised one and two qubit systems.
arXiv Detail & Related papers (2023-04-19T15:05:19Z)
PAC Reinforcement Learning for Predictive State Representations [60.00237613646686]
We study online Reinforcement Learning (RL) in partially observable dynamical systems. We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models. We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
arXiv Detail & Related papers (2022-07-12T17:57:17Z)
Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting [60.98700344526674]
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning. In this paper, we investigate a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner. We develop an algorithm tailored to this setting, achieving a sample complexity that scales practicallyly with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space.
arXiv Detail & Related papers (2021-05-17T17:22:07Z)
Bilinear Classes: A Structural Framework for Provable Generalization in RL [119.42509700822484]
Bilinear Classes is a new structural framework which permits generalization in reinforcement learning. The framework incorporates nearly all existing models in which a sample complexity is achievable. Our main result provides an RL algorithm which has sample complexity for Bilinear Classes.
arXiv Detail & Related papers (2021-03-19T16:34:20Z)
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning. In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function. Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z)
Measuring Model Complexity of Neural Networks with Curve Activation Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function. We experimentally explore the training process of neural networks and detect overfitting. We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.