Minimum complexity interpolation in random features models
- URL: http://arxiv.org/abs/2103.15996v1
- Date: Tue, 30 Mar 2021 00:00:02 GMT
- Title: Minimum complexity interpolation in random features models
- Authors: Michael Celentano, Theodor Misiakiewicz, Andrea Montanari
- Abstract summary: kernel methods are heavily affected by the curse of dimensionality.
We show that learning with $mathcalF_p$ norms is tractable in an infinite-dimensional convex problem.
We introduce a proof technique based on uniform concentration in the dual.
- Score: 16.823029377470366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite their many appealing properties, kernel methods are heavily affected
by the curse of dimensionality. For instance, in the case of inner product
kernels in $\mathbb{R}^d$, the Reproducing Kernel Hilbert Space (RKHS) norm is
often very large for functions that depend strongly on a small subset of
directions (ridge functions). Correspondingly, such functions are difficult to
learn using kernel methods.
This observation has motivated the study of generalizations of kernel
methods, whereby the RKHS norm -- which is equivalent to a weighted $\ell_2$
norm -- is replaced by a weighted functional $\ell_p$ norm, which we refer to
as $\mathcal{F}_p$ norm. Unfortunately, tractability of these approaches is
unclear. The kernel trick is not available and minimizing these norms requires
to solve an infinite-dimensional convex problem.
We study random features approximations to these norms and show that, for
$p>1$, the number of random features required to approximate the original
learning problem is upper bounded by a polynomial in the sample size. Hence,
learning with $\mathcal{F}_p$ norms is tractable in these cases. We introduce a
proof technique based on uniform concentration in the dual, which can be of
broader interest in the study of overparametrized models.
Related papers
- Gaussian kernel expansion with basis functions uniformly bounded in $\mathcal{L}_{\infty}$ [0.6138671548064355]
Kernel expansions are a topic of considerable interest in machine learning.
Recent work in the literature has derived some of these results by assuming uniformly bounded basis functions in $mathcalL_infty$.
Our main result is the construction on $mathbbR2$ of a Gaussian kernel expansion with weights in $ell_p$ for any $p>1$.
arXiv Detail & Related papers (2024-10-02T10:10:30Z) - Universality of kernel random matrices and kernel regression in the quadratic regime [18.51014786894174]
In this work, we extend the study of kernel kernel regression to the quadratic regime.
We establish an operator norm approximation bound for the difference between the original kernel random matrix and a quadratic kernel random matrix.
We characterize the precise training and generalization errors for KRR in the quadratic regime when $n/d2$ converges to a nonzero constant.
arXiv Detail & Related papers (2024-08-02T07:29:49Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - The $L^\infty$ Learnability of Reproducing Kernel Hilbert Spaces [3.2931415075553576]
We analyze the learnability of kernel spaces (RKHS) under the $Linfty$ norm.
For dot-product kernels on the sphere, we identify conditions when the $Linfty$ learning can be achieved with Hilbert samples.
arXiv Detail & Related papers (2023-06-05T12:29:13Z) - Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems.
We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z) - Feature Cross Search via Submodular Optimization [58.15569071608769]
We study feature cross search as a fundamental primitive in feature engineering.
We show that there exists a simple greedy $(1-1/e)$-approximation algorithm for this problem.
arXiv Detail & Related papers (2021-07-05T16:58:31Z) - High-Dimensional Gaussian Process Inference with Derivatives [90.8033626920884]
We show that in the low-data regime $ND$, the Gram matrix can be decomposed in a manner that reduces the cost of inference to $mathcalO(N2D + (N2)3)$.
We demonstrate this potential in a variety of tasks relevant for machine learning, such as optimization and Hamiltonian Monte Carlo with predictive gradients.
arXiv Detail & Related papers (2021-02-15T13:24:41Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z) - RFN: A Random-Feature Based Newton Method for Empirical Risk
Minimization in Reproducing Kernel Hilbert Spaces [14.924672048447334]
Large-scale finite-sum problems can be solved using efficient variants of Newton method, where the Hessian is approximated via sub-samples of data.
In this paper, we observe that for this class of problems, one can naturally use kernel approximation to speed up the Newton method.
We provide a novel second-order algorithm that enjoys local superlinear convergence and global linear convergence.
arXiv Detail & Related papers (2020-02-12T01:14:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.