Provably Strict Generalisation Benefit for Invariance in Kernel Methods
- URL: http://arxiv.org/abs/2106.02346v1
- Date: Fri, 4 Jun 2021 08:55:28 GMT
- Title: Provably Strict Generalisation Benefit for Invariance in Kernel Methods
- Authors: Bryn Elesedy
- Abstract summary: We build on the function space perspective of Elesedy and Zaidi arXiv:2102.10333 to derive a strictly non-zero generalisation benefit.
We find that generalisation is governed by a notion of effective dimension that arises from the interplay between the kernel and the group.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It is a commonly held belief that enforcing invariance improves
generalisation. Although this approach enjoys widespread popularity, it is only
very recently that a rigorous theoretical demonstration of this benefit has
been established. In this work we build on the function space perspective of
Elesedy and Zaidi arXiv:2102.10333 to derive a strictly non-zero generalisation
benefit of incorporating invariance in kernel ridge regression when the target
is invariant to the action of a compact group. We study invariance enforced by
feature averaging and find that generalisation is governed by a notion of
effective dimension that arises from the interplay between the kernel and the
group. In building towards this result, we find that the action of the group
induces an orthogonal decomposition of both the reproducing kernel Hilbert
space and its kernel, which may be of interest in its own right.
Related papers
- High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization [83.06112052443233]
This paper studies kernel ridge regression in high dimensions under covariate shifts.
By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance.
For bias, we analyze the regularization of the arbitrary or well-chosen scale, showing that the bias can behave very differently under different regularization scales.
arXiv Detail & Related papers (2024-06-05T12:03:27Z) - Generalization in Kernel Regression Under Realistic Assumptions [41.345620270267446]
We provide rigorous bounds for common kernels and for any amount of regularization, noise, any input dimension, and any number of samples.
Our results imply benign overfitting in high input dimensions, nearly tempered overfitting in fixed dimensions, and explicit convergence rates for regularized regression.
As a by-product, we obtain time-dependent bounds for neural networks trained in the kernel regime.
arXiv Detail & Related papers (2023-12-26T10:55:20Z) - Wiener Chaos in Kernel Regression: Towards Untangling Aleatoric and Epistemic Uncertainty [0.0]
We generalize the setting and consider kernel ridge regression with additive i.i.d. nonGaussian measurement noise.
We show that our approach allows us to distinguish the uncertainty that stems from the noise in the data samples from the total uncertainty encoded in the GP posterior distribution.
arXiv Detail & Related papers (2023-12-12T16:02:35Z) - Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks.
We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space.
We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Meta-Learning Hypothesis Spaces for Sequential Decision-making [79.73213540203389]
We propose to meta-learn a kernel from offline data (Meta-KeL)
Under mild conditions, we guarantee that our estimated RKHS yields valid confidence sets.
We also empirically evaluate the effectiveness of our approach on a Bayesian optimization task.
arXiv Detail & Related papers (2022-02-01T17:46:51Z) - Revisiting Memory Efficient Kernel Approximation: An Indefinite Learning
Perspective [0.8594140167290097]
Matrix approximations are a key element in large-scale machine learning approaches.
We extend MEKA to be applicable not only for shift-invariant kernels but also for non-stationary kernels.
We present a Lanczos-based estimation of a spectrum shift to develop a stable positive semi-definite MEKA approximation.
arXiv Detail & Related papers (2021-12-18T10:01:34Z) - Interpolation can hurt robust generalization even when there is no noise [76.3492338989419]
We show that avoiding generalization through ridge regularization can significantly improve generalization even in the absence of noise.
We prove this phenomenon for the robust risk of both linear regression and classification and hence provide the first theoretical result on robust overfitting.
arXiv Detail & Related papers (2021-08-05T23:04:15Z) - An Online Learning Approach to Interpolation and Extrapolation in Domain
Generalization [53.592597682854944]
We recast generalization over sub-groups as an online game between a player minimizing risk and an adversary presenting new test.
We show that ERM is provably minimax-optimal for both tasks.
arXiv Detail & Related papers (2021-02-25T19:06:48Z) - A Wigner-Eckart Theorem for Group Equivariant Convolution Kernels [16.143012623830792]
Group equivariant convolutional networks (GCNNs) endow classical convolutional networks with additional symmetry priors.
Recent advances in the theoretical description of GCNNs revealed that such models can generally be understood as performing convolutions with G-steerable kernels.
arXiv Detail & Related papers (2020-10-21T12:42:23Z) - Generalized mean shift with triangular kernel profile [5.381004207943597]
Mean Shift algorithm is a popular way to find modes of some probability density functions taking a specific kernel-based shape.
We show that a novel Mean Shift variant adapted to them can be derived, and proved to converge after a finite number of iterations.
arXiv Detail & Related papers (2020-01-07T16:46:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.