The phase diagram of kernel interpolation in large dimensions
- URL: http://arxiv.org/abs/2404.12597v1
- Date: Fri, 19 Apr 2024 03:04:06 GMT
- Title: The phase diagram of kernel interpolation in large dimensions
- Authors: Haobo Zhang, Weihao Lu, Qian Lin,
- Abstract summary: generalization ability of kernel in large dimensions might be one of the most interesting problems in the recent renaissance of kernel regression.
We fully characterized the exact order of both the variance and bias of large-dimensional kernel under various source conditions $sgeq 0$.
We determined the regions in $(s,gamma)$-plane where the kernel is minimax optimal, sub-optimal and inconsistent.
- Score: 8.707305374058794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The generalization ability of kernel interpolation in large dimensions (i.e., $n \asymp d^{\gamma}$ for some $\gamma>0$) might be one of the most interesting problems in the recent renaissance of kernel regression, since it may help us understand the 'benign overfitting phenomenon' reported in the neural networks literature. Focusing on the inner product kernel on the sphere, we fully characterized the exact order of both the variance and bias of large-dimensional kernel interpolation under various source conditions $s\geq 0$. Consequently, we obtained the $(s,\gamma)$-phase diagram of large-dimensional kernel interpolation, i.e., we determined the regions in $(s,\gamma)$-plane where the kernel interpolation is minimax optimal, sub-optimal and inconsistent.
Related papers
- Optimal Rate of Kernel Regression in Large Dimensions [13.641780902673792]
We first build a general tool to characterize the upper bound and the minimax lower bound of kernel regression for large dimensional data.
We utilize the new tool to show that the minimax rate of the excess risk of kernel regression is $n-1/2$ when $nasymp dgamma$ for $gamma =2, 4, 6, 8, cdots$.
arXiv Detail & Related papers (2023-09-08T11:29:05Z) - Kernel interpolation generalizes poorly [14.569829985753346]
We show that for any $varepsilon>0$, the error of kernel generalization is lower bounded by $Omega(n-varepsilon)$.
As a direct, we can show that overfitted wide neural networks defined on the sphere generalize poorly.
arXiv Detail & Related papers (2023-03-28T08:28:39Z) - Kernel Subspace and Feature Extraction [7.424262881242935]
We study kernel methods in machine learning from the perspective of feature subspace.
We construct a kernel from Hirschfeld--Gebelein--R'enyi maximal correlation functions, coined the maximal correlation kernel, and demonstrate its information-theoretic optimality.
arXiv Detail & Related papers (2023-01-04T02:46:11Z) - On The Relative Error of Random Fourier Features for Preserving Kernel
Distance [7.383448514243228]
We show that for a significant range of kernels, including the well-known Laplacian kernels, RFF cannot approximate the kernel distance with small relative error using low dimensions.
We make the first step towards data-oblivious dimension-reduction for general shift-invariant kernels, and we obtain a similar $mathrmpoly(epsilon-1 log n)$ dimension bound for Laplacian kernels.
arXiv Detail & Related papers (2022-10-01T10:35:12Z) - Taming Nonconvexity in Kernel Feature Selection---Favorable Properties
of the Laplace Kernel [77.73399781313893]
A challenge is to establish the objective function of kernel-based feature selection.
The gradient-based algorithms available for non-global optimization are only able to guarantee convergence to local minima.
arXiv Detail & Related papers (2021-06-17T11:05:48Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - How rotational invariance of common kernels prevents generalization in
high dimensions [8.508198765617196]
Kernel ridge regression is well-known to achieve minimax optimal rates in low-dimensional settings.
Recent work establishes consistency for kernel regression under certain assumptions on the ground truth function and the distribution of the input data.
arXiv Detail & Related papers (2021-04-09T08:27:37Z) - Flow-based Kernel Prior with Application to Blind Super-Resolution [143.21527713002354]
Kernel estimation is generally one of the key problems for blind image super-resolution (SR)
This paper proposes a normalizing flow-based kernel prior (FKP) for kernel modeling.
Experiments on synthetic and real-world images demonstrate that the proposed FKP can significantly improve the kernel estimation accuracy.
arXiv Detail & Related papers (2021-03-29T22:37:06Z) - High-Dimensional Gaussian Process Inference with Derivatives [90.8033626920884]
We show that in the low-data regime $ND$, the Gram matrix can be decomposed in a manner that reduces the cost of inference to $mathcalO(N2D + (N2)3)$.
We demonstrate this potential in a variety of tasks relevant for machine learning, such as optimization and Hamiltonian Monte Carlo with predictive gradients.
arXiv Detail & Related papers (2021-02-15T13:24:41Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - Kernel-Based Reinforcement Learning: A Finite-Time Analysis [53.47210316424326]
We introduce Kernel-UCBVI, a model-based optimistic algorithm that leverages the smoothness of the MDP and a non-parametric kernel estimator of the rewards.
We empirically validate our approach in continuous MDPs with sparse rewards.
arXiv Detail & Related papers (2020-04-12T12:23:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.