Related papers: An $L^2$ Analysis of Reinforcement Learning in High Dimensions with Kernel and Neural Network Approximation

An $L^2$ Analysis of Reinforcement Learning in High Dimensions with Kernel and Neural Network Approximation

URL: http://arxiv.org/abs/2104.07794v2
Date: Mon, 19 Apr 2021 00:59:11 GMT
Title: An $L^2$ Analysis of Reinforcement Learning in High Dimensions with Kernel and Neural Network Approximation
Authors: Jihao Long, Jiequn Han, Weinan E
Abstract summary: This paper considers the situation where the function approximation is made using the kernel method or the two-layer neural network model. We establish an $tildeO(H3|mathcal A|frac14n-frac14)$ bound for the optimal policy with $Hn$ samples. Even though this result still requires a finite-sized action space, the error bound is independent of the dimensionality of the state space.
Score: 9.088303226909277
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) algorithms based on high-dimensional function approximation have achieved tremendous empirical success in large-scale problems with an enormous number of states. However, most analysis of such algorithms gives rise to error bounds that involve either the number of states or the number of features. This paper considers the situation where the function approximation is made either using the kernel method or the two-layer neural network model, in the context of a fitted Q-iteration algorithm with explicit regularization. We establish an $\tilde{O}(H^3|\mathcal {A}|^{\frac14}n^{-\frac14})$ bound for the optimal policy with $Hn$ samples, where $H$ is the length of each episode and $|\mathcal {A}|$ is the size of action space. Our analysis hinges on analyzing the $L^2$ error of the approximated Q-function using $n$ data points. Even though this result still requires a finite-sized action space, the error bound is independent of the dimensionality of the state space.

Related papers

Entangled Mean Estimation in High-Dimensions [36.97113089188035]
We study the task of high-dimensional entangled mean estimation in the subset-of-signals model. We show that the optimal error (up to polylogarithmic factors) is $f(alpha,N) + sqrtD/(alpha N)$, where the term $f(alpha,N)$ is the error of the one-dimensional problem and the second term is the sub-Gaussian error rate.
arXiv Detail & Related papers (2025-01-09T18:31:35Z)
Convergence of Unadjusted Langevin in High Dimensions: Delocalization of Bias [13.642712817536072]
We show that as the dimension $d$ of the problem increases, the number of iterations required to ensure convergence within a desired error increases. A key technical challenge we address is the lack of a one-step contraction property in the $W_2,ellinfty$ metric to measure convergence.
arXiv Detail & Related papers (2024-08-20T01:24:54Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data. We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks. In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z)
Weighted least-squares approximation with determinantal point processes and generalized volume sampling [33.33724208084121]
We consider the problem of approximating a function from $L2$ by an element of a given $m$-dimensional space $V_m$. We show that the approximation is almost surely bounded by the best approximation error measured in the $H$-norm.
arXiv Detail & Related papers (2023-12-21T17:34:18Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
Provably Efficient Reinforcement Learning via Surprise Bound [66.15308700413814]
We propose a provably efficient reinforcement learning algorithm (both computationally and statistically) with general value function approximations. Our algorithm achieves reasonable regret bounds when applied to both the linear setting and the sparse high-dimensional linear setting.
arXiv Detail & Related papers (2023-02-22T20:21:25Z)
Understanding Deep Neural Function Approximation in Reinforcement Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL) We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces. Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z)
Localization in 1D non-parametric latent space models from pairwise affinities [6.982738885923206]
We consider the problem of estimating latent positions in a one-dimensional torus from pairwise affinities. We introduce an estimation procedure that provably localizes all the latent positions with a maximum error of the order of $sqrtlog(n)/n$, with high-probability.
arXiv Detail & Related papers (2021-08-06T13:05:30Z)
Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model [3.749193647980305]
This paper considers a Markov decision process (MDP) that admits a set of state-action features. We show that a model-based approach (resp.$$Q-learning) provably learns an $varepsilon$-optimal policy with high probability.
arXiv Detail & Related papers (2021-05-28T17:49:39Z)
Finding Global Minima via Kernel Approximations [90.42048080064849]
We consider the global minimization of smooth functions based solely on function evaluations. In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum.
arXiv Detail & Related papers (2020-12-22T12:59:30Z)
Convergence of Sparse Variational Inference in Gaussian Processes Regression [29.636483122130027]
We show that a method with an overall computational cost of $mathcalO(log N)2D(loglog N)2)$ can be used to perform inference.
arXiv Detail & Related papers (2020-08-01T19:23:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.