The Optimality of Kernel Classifiers in Sobolev Space
- URL: http://arxiv.org/abs/2402.01148v1
- Date: Fri, 2 Feb 2024 05:23:34 GMT
- Title: The Optimality of Kernel Classifiers in Sobolev Space
- Authors: Jianfa Lai, Zhifan Li, Dongming Huang, Qian Lin
- Abstract summary: This paper investigates the statistical performances of kernel classifiers.
We also propose a simple method to estimate the smoothness of $2eta(x)-1$ and apply the method to real datasets.
- Score: 3.3253452228326332
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Kernel methods are widely used in machine learning, especially for
classification problems. However, the theoretical analysis of kernel
classification is still limited. This paper investigates the statistical
performances of kernel classifiers. With some mild assumptions on the
conditional probability $\eta(x)=\mathbb{P}(Y=1\mid X=x)$, we derive an upper
bound on the classification excess risk of a kernel classifier using recent
advances in the theory of kernel regression. We also obtain a minimax lower
bound for Sobolev spaces, which shows the optimality of the proposed
classifier. Our theoretical results can be extended to the generalization error
of overparameterized neural network classifiers. To make our theoretical
results more applicable in realistic settings, we also propose a simple method
to estimate the interpolation smoothness of $2\eta(x)-1$ and apply the method
to real datasets.
Related papers
- Enhanced Feature Learning via Regularisation: Integrating Neural Networks and Kernel Methods [0.0]
We introduce an efficient method for the estimator, called Brownian Kernel Neural Network (BKerNN)
We show that BKerNN's expected risk converges to the minimal risk with explicit high-probability rates of $O( min((d/n)1/2, n-1/6)$ (up to logarithmic factors)
arXiv Detail & Related papers (2024-07-24T13:46:50Z) - Learning "best" kernels from data in Gaussian process regression. With
application to aerodynamics [0.4588028371034406]
We introduce algorithms to select/design kernels in Gaussian process regression/kriging surrogate modeling techniques.
A first class of algorithms is kernel flow, which was introduced in a context of classification in machine learning.
A second class of algorithms is called spectral kernel ridge regression, and aims at selecting a "best" kernel such that the norm of the function to be approximated is minimal.
arXiv Detail & Related papers (2022-06-03T07:50:54Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Nystr\"om Kernel Mean Embeddings [92.10208929236826]
We propose an efficient approximation procedure based on the Nystr"om method.
It yields sufficient conditions on the subsample size to obtain the standard $n-1/2$ rate.
We discuss applications of this result for the approximation of the maximum mean discrepancy and quadrature rules.
arXiv Detail & Related papers (2022-01-31T08:26:06Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Learning primal-dual sparse kernel machines [10.230121160034674]
Traditionally, kernel methods rely on the representer theorem which states that the solution to a learning problem is obtained as a linear combination of the data mapped into the reproducing kernel Hilbert space (RKHS)
We propose to search for a solution in RKHS that has a pre-image decomposition in the original data space, where the elements don't necessarily correspond to the elements in the training set.
Our gradient-based method then hinges on optimisation over possibly sparse elements in the input space, and enables us to obtain a kernel-based model with both primal and dual sparsity.
arXiv Detail & Related papers (2021-08-27T09:38:53Z) - Locality defeats the curse of dimensionality in convolutional
teacher-student scenarios [69.2027612631023]
We show that locality is key in determining the learning curve exponent $beta$.
We conclude by proving, using a natural assumption, that performing kernel regression with a ridge that decreases with the size of the training set leads to similar learning curve exponents to those we obtain in the ridgeless case.
arXiv Detail & Related papers (2021-06-16T08:27:31Z) - Kernelized Classification in Deep Networks [49.47339560731506]
We propose a kernelized classification layer for deep networks.
We advocate a nonlinear classification layer by using the kernel trick on the softmax cross-entropy loss function during training.
We show the usefulness of the proposed nonlinear classification layer on several datasets and tasks.
arXiv Detail & Related papers (2020-12-08T21:43:19Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - RFN: A Random-Feature Based Newton Method for Empirical Risk
Minimization in Reproducing Kernel Hilbert Spaces [14.924672048447334]
Large-scale finite-sum problems can be solved using efficient variants of Newton method, where the Hessian is approximated via sub-samples of data.
In this paper, we observe that for this class of problems, one can naturally use kernel approximation to speed up the Newton method.
We provide a novel second-order algorithm that enjoys local superlinear convergence and global linear convergence.
arXiv Detail & Related papers (2020-02-12T01:14:44Z) - Kernel Selection for Modal Linear Regression: Optimal Kernel and IRLS
Algorithm [8.571896191090744]
We show that a Biweight kernel is optimal in the sense of minimizing an mean squared error of a resulting MLR parameter.
Secondly, we provide a kernel class for which algorithm iteratively reweighted least-squares algorithm (IRLS) is guaranteed to converge.
arXiv Detail & Related papers (2020-01-30T03:57:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.