RFN: A Random-Feature Based Newton Method for Empirical Risk
Minimization in Reproducing Kernel Hilbert Spaces
- URL: http://arxiv.org/abs/2002.04753v4
- Date: Mon, 6 Jun 2022 13:35:58 GMT
- Title: RFN: A Random-Feature Based Newton Method for Empirical Risk
Minimization in Reproducing Kernel Hilbert Spaces
- Authors: Ting-Jui Chang, Shahin Shahrampour
- Abstract summary: Large-scale finite-sum problems can be solved using efficient variants of Newton method, where the Hessian is approximated via sub-samples of data.
In this paper, we observe that for this class of problems, one can naturally use kernel approximation to speed up the Newton method.
We provide a novel second-order algorithm that enjoys local superlinear convergence and global linear convergence.
- Score: 14.924672048447334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In supervised learning using kernel methods, we often encounter a large-scale
finite-sum minimization over a reproducing kernel Hilbert space (RKHS).
Large-scale finite-sum problems can be solved using efficient variants of
Newton method, where the Hessian is approximated via sub-samples of data. In
RKHS, however, the dependence of the penalty function to kernel makes standard
sub-sampling approaches inapplicable, since the gram matrix is not readily
available in a low-rank form. In this paper, we observe that for this class of
problems, one can naturally use kernel approximation to speed up the Newton
method. Focusing on randomized features for kernel approximation, we provide a
novel second-order algorithm that enjoys local superlinear convergence and
global linear convergence (with high probability). We derive the theoretical
lower bound for the number of random features required for the approximated
Hessian to be close to the true Hessian in the norm sense. Our numerical
experiments on real-world data verify the efficiency of our method compared to
several benchmarks.
Related papers
- Fast Unconstrained Optimization via Hessian Averaging and Adaptive Gradient Sampling Methods [0.3222802562733786]
We consider minimizing finite-sum expectation objective functions via Hessian-averaging based subsampled Newton methods.
These methods allow for inexactness and have fixed per-it Hessian approximation costs.
We present novel analysis techniques and propose challenges for their practical implementation.
arXiv Detail & Related papers (2024-08-14T03:27:48Z) - On the Approximation of Kernel functions [0.0]
The paper addresses approximations of the kernel itself.
For the Hilbert Gauss kernel on the unit cube, the paper establishes an upper bound of the associated eigenfunctions.
This improvement confirms low rank approximation methods such as the Nystr"om method.
arXiv Detail & Related papers (2024-03-11T13:50:07Z) - Incremental Quasi-Newton Methods with Faster Superlinear Convergence
Rates [50.36933471975506]
We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian.
The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate.
This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.
arXiv Detail & Related papers (2024-02-04T05:54:51Z) - The Optimality of Kernel Classifiers in Sobolev Space [3.3253452228326332]
This paper investigates the statistical performances of kernel classifiers.
We also propose a simple method to estimate the smoothness of $2eta(x)-1$ and apply the method to real datasets.
arXiv Detail & Related papers (2024-02-02T05:23:34Z) - Riemannian stochastic optimization methods avoid strict saddle points [68.80251170757647]
We show that policies under study avoid strict saddle points / submanifolds with probability 1.
This result provides an important sanity check as it shows that, almost always, the limit state of an algorithm can only be a local minimizer.
arXiv Detail & Related papers (2023-11-04T11:12:24Z) - Learning "best" kernels from data in Gaussian process regression. With
application to aerodynamics [0.4588028371034406]
We introduce algorithms to select/design kernels in Gaussian process regression/kriging surrogate modeling techniques.
A first class of algorithms is kernel flow, which was introduced in a context of classification in machine learning.
A second class of algorithms is called spectral kernel ridge regression, and aims at selecting a "best" kernel such that the norm of the function to be approximated is minimal.
arXiv Detail & Related papers (2022-06-03T07:50:54Z) - Hessian Averaging in Stochastic Newton Methods Achieves Superlinear
Convergence [69.65563161962245]
We consider a smooth and strongly convex objective function using a Newton method.
We show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage.
arXiv Detail & Related papers (2022-04-20T07:14:21Z) - Nystr\"om Kernel Mean Embeddings [92.10208929236826]
We propose an efficient approximation procedure based on the Nystr"om method.
It yields sufficient conditions on the subsample size to obtain the standard $n-1/2$ rate.
We discuss applications of this result for the approximation of the maximum mean discrepancy and quadrature rules.
arXiv Detail & Related papers (2022-01-31T08:26:06Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Towards Unbiased Random Features with Lower Variance For Stationary
Indefinite Kernels [26.57122949130266]
Our algorithm achieves lower variance and approximation error compared with the existing kernel approximation methods.
With better approximation to the originally selected kernels, improved classification accuracy and regression ability is obtained.
arXiv Detail & Related papers (2021-04-13T13:56:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.