Kernel Alignment Risk Estimator: Risk Prediction from Training Data
- URL: http://arxiv.org/abs/2006.09796v1
- Date: Wed, 17 Jun 2020 12:00:05 GMT
- Title: Kernel Alignment Risk Estimator: Risk Prediction from Training Data
- Authors: Arthur Jacot, Berfin \c{S}im\c{s}ek, Francesco Spadaro, Cl\'ement
Hongler, Franck Gabriel
- Abstract summary: We study the risk (i.e. generalization error) of Kernel Ridge Regression (KRR) for a kernel $K$ with ridge $lambda>0$ and i.i.i.d. observations.
We introduce two objects: the Signal Capture Threshold (SCT) and the Kernel Alignment Risk Estimator (KARE)
- Score: 10.739602293023058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the risk (i.e. generalization error) of Kernel Ridge Regression
(KRR) for a kernel $K$ with ridge $\lambda>0$ and i.i.d. observations. For
this, we introduce two objects: the Signal Capture Threshold (SCT) and the
Kernel Alignment Risk Estimator (KARE). The SCT $\vartheta_{K,\lambda}$ is a
function of the data distribution: it can be used to identify the components of
the data that the KRR predictor captures, and to approximate the (expected) KRR
risk. This then leads to a KRR risk approximation by the KARE $\rho_{K,
\lambda}$, an explicit function of the training data, agnostic of the true data
distribution. We phrase the regression problem in a functional setting. The key
results then follow from a finite-size analysis of the Stieltjes transform of
general Wishart random matrices. Under a natural universality assumption (that
the KRR moments depend asymptotically on the first two moments of the
observations) we capture the mean and variance of the KRR predictor. We
numerically investigate our findings on the Higgs and MNIST datasets for
various classical kernels: the KARE gives an excellent approximation of the
risk, thus supporting our universality assumption. Using the KARE, one can
compare choices of Kernels and hyperparameters directly from the training set.
The KARE thus provides a promising data-dependent procedure to select Kernels
that generalize well.
Related papers
- Highly Adaptive Ridge [84.38107748875144]
We propose a regression method that achieves a $n-2/3$ dimension-free L2 convergence rate in the class of right-continuous functions with square-integrable sectional derivatives.
Har is exactly kernel ridge regression with a specific data-adaptive kernel based on a saturated zero-order tensor-product spline basis expansion.
We demonstrate empirical performance better than state-of-the-art algorithms for small datasets in particular.
arXiv Detail & Related papers (2024-10-03T17:06:06Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.
We further extend our analysis to the case where the test point has non-trivial correlations with the training set, setting often encountered in time series forecasting.
We validate our theory across a variety of high dimensional data.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Universality of kernel random matrices and kernel regression in the quadratic regime [18.51014786894174]
In this work, we extend the study of kernel kernel regression to the quadratic regime.
We establish an operator norm approximation bound for the difference between the original kernel random matrix and a quadratic kernel random matrix.
We characterize the precise training and generalization errors for KRR in the quadratic regime when $n/d2$ converges to a nonzero constant.
arXiv Detail & Related papers (2024-08-02T07:29:49Z) - On the Size and Approximation Error of Distilled Sets [57.61696480305911]
We take a theoretical view on kernel ridge regression based methods of dataset distillation such as Kernel Inducing Points.
We prove that a small set of instances exists in the original input space such that its solution in the RFF space coincides with the solution of the original data.
A KRR solution can be generated using this distilled set of instances which gives an approximation towards the KRR solution optimized on the full input data.
arXiv Detail & Related papers (2023-05-23T14:37:43Z) - Kernel Ridge Regression Inference [7.066496204344619]
We provide uniform inference and confidence bands for kernel ridge regression.
We construct sharp, uniform confidence sets for KRR, which shrink at nearly the minimax rate, for general regressors.
We use our procedure to construct a novel test for match effects in school assignment.
arXiv Detail & Related papers (2023-02-13T18:26:36Z) - Overparameterized random feature regression with nearly orthogonal data [21.97381518762387]
We study the non-asymptotic behaviors of the random feature ridge regression (RFRR) given by a two-layer neural network.
Our results hold for a wide variety of activation functions and input data sets that exhibit nearly deterministic properties.
arXiv Detail & Related papers (2022-11-11T09:16:25Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - Outlier detection in non-elliptical data by kernel MRCD [10.69910379275607]
The Kernel Minimum Regularized Covariance Determinant (KMRCD) estimator is proposed.
It implicitly computes the MRCD estimates in a kernel induced feature space.
A fast algorithm is constructed that starts from kernel-based initial estimates and exploits the kernel trick to speed up the subsequent computations.
arXiv Detail & Related papers (2020-08-05T11:09:08Z) - Optimal Rates of Distributed Regression with Imperfect Kernels [0.0]
We study the distributed kernel regression via the divide conquer and conquer approach.
We show that the kernel ridge regression can achieve rates faster than $N-1$ in the noise free setting.
arXiv Detail & Related papers (2020-06-30T13:00:16Z) - Sharp Statistical Guarantees for Adversarially Robust Gaussian
Classification [54.22421582955454]
We provide the first result of the optimal minimax guarantees for the excess risk for adversarially robust classification.
Results are stated in terms of the Adversarial Signal-to-Noise Ratio (AdvSNR), which generalizes a similar notion for standard linear classification to the adversarial setting.
arXiv Detail & Related papers (2020-06-29T21:06:52Z) - Improved guarantees and a multiple-descent curve for Column Subset
Selection and the Nystr\"om method [76.73096213472897]
We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees.
Our approach leads to significantly better bounds for datasets with known rates of singular value decay.
We show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.
arXiv Detail & Related papers (2020-02-21T00:43:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.