Related papers: High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization

High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization

URL: http://arxiv.org/abs/2406.03171v1
Date: Wed, 5 Jun 2024 12:03:27 GMT
Title: High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization
Authors: Yihang Chen, Fanghui Liu, Taiji Suzuki, Volkan Cevher,
Abstract summary: This paper studies kernel ridge regression in high dimensions under covariate shifts. By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance. For bias, we analyze the regularization of the arbitrary or well-chosen scale, showing that the bias can behave very differently under different regularization scales.
Score: 83.06112052443233
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper studies kernel ridge regression in high dimensions under covariate shifts and analyzes the role of importance re-weighting. We first derive the asymptotic expansion of high dimensional kernels under covariate shifts. By a bias-variance decomposition, we theoretically demonstrate that the re-weighting strategy allows for decreasing the variance. For bias, we analyze the regularization of the arbitrary or well-chosen scale, showing that the bias can behave very differently under different regularization scales. In our analysis, the bias and variance can be characterized by the spectral decay of a data-dependent regularized kernel: the original kernel matrix associated with an additional re-weighting matrix, and thus the re-weighting strategy can be regarded as a data-dependent regularization for better understanding. Besides, our analysis provides asymptotic expansion of kernel functions/vectors under covariate shift, which has its own interest.

Related papers

Asymptotics of Linear Regression with Linearly Dependent Data [28.005935031887038]
We study the computations of linear regression in settings with non-Gaussian covariates. We show how dependencies influence estimation error and the choice of regularization parameters.
arXiv Detail & Related papers (2024-12-04T20:31:47Z)
Generalization Error Curves for Analytic Spectral Algorithms under Power-law Decay [13.803850290216257]
We rigorously provide a full characterization of the generalization error curves of the kernel gradient descent method. Thanks to the neural tangent kernel theory, these results greatly improve our understanding of the generalization behavior of training the wide neural networks.
arXiv Detail & Related papers (2024-01-03T08:00:50Z)
Curvature-Independent Last-Iterate Convergence for Games on Riemannian Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate. To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z)
Instance-Dependent Generalization Bounds via Optimal Transport [51.71650746285469]
Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks. We derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space. We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
arXiv Detail & Related papers (2022-11-02T16:39:42Z)
ER: Equivariance Regularizer for Knowledge Graph Completion [107.51609402963072]
We propose a new regularizer, namely, Equivariance Regularizer (ER) ER can enhance the generalization ability of the model by employing the semantic equivariance between the head and tail entities. The experimental results indicate a clear and substantial improvement over the state-of-the-art relation prediction methods.
arXiv Detail & Related papers (2022-06-24T08:18:05Z)
Towards Understanding Generalization via Decomposing Excess Risk Dynamics [13.4379473119565]
We analyze the generalization dynamics to derive algorithm-dependent bounds, e.g., stability. Inspired by the observation that neural networks show a slow convergence rate when fitting noise, we propose decomposing the excess risk dynamics. Under the decomposition framework, the new bound accords better with the theoretical and empirical evidence compared to the stability-based bound and uniform convergence bound.
arXiv Detail & Related papers (2021-06-11T03:42:45Z)
How rotational invariance of common kernels prevents generalization in high dimensions [8.508198765617196]
Kernel ridge regression is well-known to achieve minimax optimal rates in low-dimensional settings. Recent work establishes consistency for kernel regression under certain assumptions on the ground truth function and the distribution of the input data.
arXiv Detail & Related papers (2021-04-09T08:27:37Z)
Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically. This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression. We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)
Kernel regression in high dimensions: Refined analysis beyond double descent [21.15702374555439]
We show that while the bias is (almost) independent of d and monotonically decreases with n, the variance depends on n, d and can be unimodal or monotonically decreasing under different regularization schemes. Our refined analysis goes beyond the double descent theory by showing that, depending on the data eigen-profile and the level of regularization, the kernel regression risk curve can be a double-descent-like, bell-shaped, or monotonic function of n.
arXiv Detail & Related papers (2020-10-06T12:59:59Z)
Understanding Implicit Regularization in Over-Parameterized Single Index Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model. We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)
When Does Preconditioning Help or Hurt Generalization? [74.25170084614098]
We show how the textitimplicit bias of first and second order methods affects the comparison of generalization properties. We discuss several approaches to manage the bias-variance tradeoff, and the potential benefit of interpolating between GD and NGD.
arXiv Detail & Related papers (2020-06-18T17:57:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.