The $L^\infty$ Learnability of Reproducing Kernel Hilbert Spaces
- URL: http://arxiv.org/abs/2306.02833v1
- Date: Mon, 5 Jun 2023 12:29:13 GMT
- Title: The $L^\infty$ Learnability of Reproducing Kernel Hilbert Spaces
- Authors: Hongrui Chen, Jihao Long, Lei Wu
- Abstract summary: We analyze the learnability of kernel spaces (RKHS) under the $Linfty$ norm.
For dot-product kernels on the sphere, we identify conditions when the $Linfty$ learning can be achieved with Hilbert samples.
- Score: 3.2931415075553576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we analyze the learnability of reproducing kernel Hilbert
spaces (RKHS) under the $L^\infty$ norm, which is critical for understanding
the performance of kernel methods and random feature models in safety- and
security-critical applications. Specifically, we relate the $L^\infty$
learnability of a RKHS to the spectrum decay of the associate kernel and both
lower bounds and upper bounds of the sample complexity are established. In
particular, for dot-product kernels on the sphere, we identify conditions when
the $L^\infty$ learning can be achieved with polynomial samples. Let $d$ denote
the input dimension and assume the kernel spectrum roughly decays as
$\lambda_k\sim k^{-1-\beta}$ with $\beta>0$. We prove that if $\beta$ is
independent of the input dimension $d$, then functions in the RKHS can be
learned efficiently under the $L^\infty$ norm, i.e., the sample complexity
depends polynomially on $d$. In contrast, if $\beta=1/\mathrm{poly}(d)$, then
the $L^\infty$ learning requires exponentially many samples.
Related papers
- Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data.
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Toward $L_\infty$-recovery of Nonlinear Functions: A Polynomial Sample
Complexity Bound for Gaussian Random Fields [35.76184529520015]
Many machine learning applications require learning a function with a small worst-case error over the entire input domain, that is, the $L_infty$-error.
Most existing theoretical works only guarantee recovery in average errors such as the $L$-error.
This paper makes some initial steps beyond the impossibility results by leveraging the randomness in the ground-truth functions.
arXiv Detail & Related papers (2023-04-29T18:33:39Z) - Neural Networks can Learn Representations with Gradient Descent [68.95262816363288]
In specific regimes, neural networks trained by gradient descent behave like kernel methods.
In practice, it is known that neural networks strongly outperform their associated kernels.
arXiv Detail & Related papers (2022-06-30T09:24:02Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Optimal Spectral Recovery of a Planted Vector in a Subspace [80.02218763267992]
We study efficient estimation and detection of a planted vector $v$ whose $ell_4$ norm differs from that of a Gaussian vector with the same $ell$ norm.
We show that in the regime $n rho gg sqrtN$, any spectral method from a large class (and more generally, any low-degree of the input) fails to detect the planted vector.
arXiv Detail & Related papers (2021-05-31T16:10:49Z) - Kernel Two-Sample Tests for Manifold Data [22.09364630430699]
We present a kernel-based two-sample test statistic related to the Maximum Mean Discrepancy (MMD) in the manifold data setting.
We characterize the test level and power in relation to the kernel bandwidth, the number of samples, and the intrinsic dimensionality of the manifold.
Our results indicate that the kernel two-sample test has no curse-of-dimensionality when the data lie on or near a low-dimensional manifold.
arXiv Detail & Related papers (2021-05-07T17:56:45Z) - Minimum complexity interpolation in random features models [16.823029377470366]
kernel methods are heavily affected by the curse of dimensionality.
We show that learning with $mathcalF_p$ norms is tractable in an infinite-dimensional convex problem.
We introduce a proof technique based on uniform concentration in the dual.
arXiv Detail & Related papers (2021-03-30T00:00:02Z) - Small Covers for Near-Zero Sets of Polynomials and Learning Latent
Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$.
Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z) - Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation [30.137884459159107]
We consider the question of learning $Q$-function in a sample efficient manner for reinforcement learning with continuous state and action spaces.
We develop a simple, iterative learning algorithm that finds $epsilon$-Schmidt $Q$-function with sample complexity of $widetildeO(frac1epsilonmax(d1), d_2)+2)$ when the optimal $Q$-function has low rank $r$ and the factor $$ is below a certain threshold.
arXiv Detail & Related papers (2020-06-11T00:55:35Z) - Learning Polynomials of Few Relevant Dimensions [12.122488725065388]
Polynomial regression is a basic primitive in learning and statistics.
We give an algorithm that learns the runtime within accuracy $epsilon$ with sample complexity that is roughly $N = O_r,d(n log2(1/epsilon) (log n)d)$ and $O_r,d(N n2)$.
arXiv Detail & Related papers (2020-04-28T18:00:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.