On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces
- URL: http://arxiv.org/abs/2011.04622v2
- Date: Tue, 29 Dec 2020 17:24:48 GMT
- Title: On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces
- Authors: Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael I. Jordan
- Abstract summary: We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
- Score: 208.67848059021915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The classical theory of reinforcement learning (RL) has focused on tabular
and linear representations of value functions. Further progress hinges on
combining RL with modern function approximators such as kernel functions and
deep neural networks, and indeed there have been many empirical successes that
have exploited such combinations in large-scale applications. There are
profound challenges, however, in developing a theory to support this
enterprise, most notably the need to take into consideration the
exploration-exploitation tradeoff at the core of RL in conjunction with the
computational and statistical tradeoffs that arise in modern
function-approximation-based learning systems. We approach these challenges by
studying an optimistic modification of the least-squares value iteration
algorithm, in the context of the action-value function
represented by a kernel function or an overparameterized neural network. We
establish both polynomial runtime complexity and polynomial sample complexity
for this algorithm, without additional assumptions on the data-generating
model. In particular, we prove that the algorithm incurs an
$\tilde{\mathcal{O}}(\delta_{\mathcal{F}} H^2 \sqrt{T})$ regret, where
$\delta_{\mathcal{F}}$ characterizes the intrinsic complexity of the function
class $\mathcal{F}$, $H$ is the length of each episode, and $T$ is the total
number of episodes. Our regret bounds are independent of the number of states,
a result which exhibits clearly the benefit of function approximation in RL.
Related papers
- Uncertainty-Aware Reward-Free Exploration with General Function Approximation [69.27868448449755]
In this paper, we propose a reward-free reinforcement learning algorithm called alg.
The key idea behind our algorithm is an uncertainty-aware intrinsic reward for exploring the environment.
Experiment results show that GFA-RFE outperforms or is comparable to the performance of state-of-the-art unsupervised RL algorithms.
arXiv Detail & Related papers (2024-06-24T01:37:18Z) - Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations [40.77319247558742]
We study the computational complexity of learning a target function $f_*:mathbbRdtomathbbR$ with additive structure.
We prove that a large subset of $f_*$ can be efficiently learned by gradient training of a two-layer neural network.
arXiv Detail & Related papers (2024-06-17T17:59:17Z) - A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning
with General Function Approximation [66.26739783789387]
We propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for reinforcement learning.
MQL-UCB achieves minimax optimal regret of $tildeO(dsqrtHK)$ when $K$ is sufficiently large and near-optimal policy switching cost.
Our work sheds light on designing provably sample-efficient and deployment-efficient Q-learning with nonlinear function approximation.
arXiv Detail & Related papers (2023-11-26T08:31:57Z) - On Representation Complexity of Model-based and Model-free Reinforcement
Learning [11.843778337443824]
We study the representation complexity of model-based and model-free reinforcement learning (RL) in the context of circuit complexity.
We prove theoretically that there exists a broad class of MDPs such that their underlying transition and reward functions can be represented by constant depth circuits with size.
By drawing attention to the approximation errors and building connections to complexity theory, our theory provides unique insights into why model-based algorithms usually enjoy better sample complexity than model-free algorithms from a novel representation complexity perspective.
arXiv Detail & Related papers (2023-10-03T00:01:58Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Optimal approximation using complex-valued neural networks [0.0]
Complex-valued neural networks (CVNNs) have recently shown promising empirical success.
We analyze the expressivity of CVNNs by studying their approximation properties.
arXiv Detail & Related papers (2023-03-29T15:56:43Z) - Provably Efficient Reinforcement Learning via Surprise Bound [66.15308700413814]
We propose a provably efficient reinforcement learning algorithm (both computationally and statistically) with general value function approximations.
Our algorithm achieves reasonable regret bounds when applied to both the linear setting and the sparse high-dimensional linear setting.
arXiv Detail & Related papers (2023-02-22T20:21:25Z) - Sample Complexity of Kernel-Based Q-Learning [11.32718794195643]
We propose a non-parametric Q-learning algorithm which finds an $epsilon$-optimal policy in an arbitrarily large scale discounted MDP.
To the best of our knowledge, this is the first result showing a finite sample complexity under such a general model.
arXiv Detail & Related papers (2023-02-01T19:46:25Z) - Reinforcement Learning with General Value Function Approximation:
Provably Efficient Approach via Bounded Eluder Dimension [124.7752517531109]
We establish a provably efficient reinforcement learning algorithm with general value function approximation.
We show that our algorithm achieves a regret bound of $widetildeO(mathrmpoly(dH)sqrtT)$ where $d$ is a complexity measure.
Our theory generalizes recent progress on RL with linear value function approximation and does not make explicit assumptions on the model of the environment.
arXiv Detail & Related papers (2020-05-21T17:36:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.