How isotropic kernels perform on simple invariants
- URL: http://arxiv.org/abs/2006.09754v5
- Date: Mon, 14 Dec 2020 20:20:51 GMT
- Title: How isotropic kernels perform on simple invariants
- Authors: Jonas Paccolat, Stefano Spigler and Matthieu Wyart
- Abstract summary: We investigate how the training curve of isotropic kernel methods depends on the symmetry of the task to be learned.
We show that for large bandwidth, $beta = fracd-1+xi3d-3+xi$, where $xiin (0,2)$ is the exponent characterizing the stripe of the kernel at the origin.
- Score: 0.5729426778193397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate how the training curve of isotropic kernel methods depends on
the symmetry of the task to be learned, in several settings. (i) We consider a
regression task, where the target function is a Gaussian random field that
depends only on $d_\parallel$ variables, fewer than the input dimension $d$. We
compute the expected test error $\epsilon$ that follows $\epsilon\sim
p^{-\beta}$ where $p$ is the size of the training set. We find that $\beta\sim
1/d$ independently of $d_\parallel$, supporting previous findings that the
presence of invariants does not resolve the curse of dimensionality for kernel
regression. (ii) Next we consider support-vector binary classification and
introduce the stripe model where the data label depends on a single coordinate
$y(\underline{x}) = y(x_1)$, corresponding to parallel decision boundaries
separating labels of different signs, and consider that there is no margin at
these interfaces. We argue and confirm numerically that for large bandwidth,
$\beta = \frac{d-1+\xi}{3d-3+\xi}$, where $\xi\in (0,2)$ is the exponent
characterizing the singularity of the kernel at the origin. This estimation
improves classical bounds obtainable from Rademacher complexity. In this
setting there is no curse of dimensionality since $\beta\rightarrow 1 / 3$ as
$d\rightarrow\infty$. (iii) We confirm these findings for the spherical model
for which $y(\underline{x}) = y(|\underline{x}|)$. (iv) In the stripe model, we
show that if the data are compressed along their invariants by some factor
$\lambda$ (an operation believed to take place in deep networks), the test
error is reduced by a factor $\lambda^{-\frac{2(d-1)}{3d-3+\xi}}$.
Related papers
- A computational transition for detecting correlated stochastic block models by low-degree polynomials [13.396246336911842]
Detection of correlation in a pair of random graphs is a fundamental statistical and computational problem that has been extensively studied in recent years.
We consider a pair of correlated block models $mathcalS(n,tfraclambdan;k,epsilon;s)$ that are subsampled from a common parent block model $mathcal S(n,tfraclambdan;k,epsilon;s)$
We focus on tests based on emphlow-degrees of the entries of the adjacency
arXiv Detail & Related papers (2024-09-02T06:14:05Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Mean Estimation in High-Dimensional Binary Markov Gaussian Mixture
Models [12.746888269949407]
We consider a high-dimensional mean estimation problem over a binary hidden Markov model.
We establish a nearly minimax optimal (up to logarithmic factors) estimation error rate, as a function of $|theta_*|,delta,d,n$.
arXiv Detail & Related papers (2022-06-06T09:34:04Z) - Robust Testing in High-Dimensional Sparse Models [0.0]
We consider the problem of robustly testing the norm of a high-dimensional sparse signal vector under two different observation models.
We show that any algorithm that reliably tests the norm of the regression coefficient requires at least $n=Omegaleft(min(slog d,1/gamma4)right) samples.
arXiv Detail & Related papers (2022-05-16T07:47:22Z) - Structure Learning in Graphical Models from Indirect Observations [17.521712510832558]
This paper considers learning of the graphical structure of a $p$-dimensional random vector $X in Rp$ using both parametric and non-parametric methods.
Under mild conditions, we show that our graph-structure estimator can obtain the correct structure.
arXiv Detail & Related papers (2022-05-06T19:24:44Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Variance-Aware Confidence Set: Variance-Dependent Bound for Linear
Bandits and Horizon-Free Bound for Linear Mixture MDP [76.94328400919836]
We show how to construct variance-aware confidence sets for linear bandits and linear mixture Decision Process (MDP)
For linear bandits, we obtain an $widetildeO(mathrmpoly(d)sqrt1 + sum_i=1Ksigma_i2) regret bound, where $d is the feature dimension.
For linear mixture MDP, we obtain an $widetildeO(mathrmpoly(d)sqrtK)$ regret bound, where
arXiv Detail & Related papers (2021-01-29T18:57:52Z) - Geometric compression of invariant manifolds in neural nets [2.461575510055098]
We study how neural networks compress uninformative input space in models where data lie in $d$ dimensions.
We show that for a one-hidden layer FC network trained with gradient descent, the first layer of weights evolve to become nearly insensitive to the $d_perp=d-d_parallel$ uninformative directions.
Next we show that compression shapes the Neural Kernel (NTK) evolution in time, so that its top eigenvectors become more informative and display a larger projection on the labels.
arXiv Detail & Related papers (2020-07-22T14:43:49Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.