Robust, randomized preconditioning for kernel ridge regression
- URL: http://arxiv.org/abs/2304.12465v4
- Date: Wed, 10 Jul 2024 19:46:15 GMT
- Title: Robust, randomized preconditioning for kernel ridge regression
- Authors: Mateo Díaz, Ethan N. Epperly, Zachary Frangella, Joel A. Tropp, Robert J. Webber,
- Abstract summary: This paper investigates two randomized preconditioning techniques for solving kernel ridge regression problems.
It introduces two new methods with state-of-the-art performance.
The proposed methods solve a broad range of KRR problems, making them ideal for practical applications.
- Score: 3.521877014965197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates two randomized preconditioning techniques for solving kernel ridge regression (KRR) problems with a medium to large number of data points ($10^4 \leq N \leq 10^7$), and it introduces two new methods with state-of-the-art performance. The first method, RPCholesky preconditioning, accurately solves the full-data KRR problem in $O(N^2)$ arithmetic operations, assuming sufficiently rapid polynomial decay of the kernel matrix eigenvalues. The second method, KRILL preconditioning, offers an accurate solution to a restricted version of the KRR problem involving $k \ll N$ selected data centers at a cost of $O((N + k^2) k \log k)$ operations. The proposed methods solve a broad range of KRR problems, making them ideal for practical applications.
Related papers
- Have ASkotch: A Neat Solution for Large-scale Kernel Ridge Regression [16.836685923503868]
ASkotch is a scalable, accelerated, iterative method for full KRR that provably obtains linear convergence.
ASkotch outperforms state-of-the-art KRR solvers on a testbed of 23 large-scale KRR regression and classification tasks.
Our work opens up the possibility of as-yet-unimagined applications of full KRR across a number of disciplines.
arXiv Detail & Related papers (2024-07-14T04:11:10Z) - Near-Optimal Algorithms for Gaussians with Huber Contamination: Mean
Estimation and Linear Regression [44.13655983242414]
We design the first sample near- and almost linear-time algorithms with optimal error guarantees.
For robust linear regression, we give the first algorithm with sample complexity $n = tildeO(d/epsilon2)$ and almost linear runtime that approximates the target regressor within $ell$- $O(epsilon)$.
This is the first sample and time algorithm achieving the optimal error guarantee, answering an open question in the literature.
arXiv Detail & Related papers (2023-12-04T00:31:16Z) - A Novel Normalized-Cut Solver with Nearest Neighbor Hierarchical
Initialization [107.07093621337084]
Normalized-Cut (N-Cut) is a famous model of spectral clustering.
Traditional N-Cut solvers are two-stage: 1) calculating the continuous spectral embedding of normalized Laplacian matrix; 2) discretization via $K$-means or spectral rotation.
We propose a novel N-Cut solver based on the famous coordinate descent method.
arXiv Detail & Related papers (2023-11-26T07:11:58Z) - A quantum central path algorithm for linear optimization [5.450016817940232]
We propose a novel quantum algorithm for solving linear optimization problems by quantum-mechanical simulation of the central path.
This approach yields an algorithm for solving linear optimization problems involving $m$ constraints and $n$ variables to $varepsilon$-optimality.
In the standard gate model (i.e., without access to quantum RAM), our algorithm can obtain highly-precise solutions to LO problems using at most $$mathcalO left( sqrtm + n textsfnnz (A) fracR_1
arXiv Detail & Related papers (2023-11-07T13:26:20Z) - Oracle Complexity Reduction for Model-free LQR: A Stochastic
Variance-Reduced Policy Gradient Approach [4.422315636150272]
We investigate the problem of learning an $epsilon$-approximate solution for the discrete-time Linear Quadratic Regulator (LQR) problem.
Our method combines both one-point and two-point estimations in a dual-loop variance-reduced algorithm.
arXiv Detail & Related papers (2023-09-19T15:03:18Z) - Convergence analysis of online algorithms for vector-valued kernel regression [0.42970700836450487]
We consider the problem of approximating the regression function from noisy vector-valued data by an online learning algorithm.
We show that the expected squared error in the RKHS norm can be bounded by $C2 (m+1)-s/(2+s)$, where $m$ is the current number of processed data.
arXiv Detail & Related papers (2023-09-14T15:10:47Z) - Learning "best" kernels from data in Gaussian process regression. With
application to aerodynamics [0.4588028371034406]
We introduce algorithms to select/design kernels in Gaussian process regression/kriging surrogate modeling techniques.
A first class of algorithms is kernel flow, which was introduced in a context of classification in machine learning.
A second class of algorithms is called spectral kernel ridge regression, and aims at selecting a "best" kernel such that the norm of the function to be approximated is minimal.
arXiv Detail & Related papers (2022-06-03T07:50:54Z) - Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming [53.63469275932989]
We consider online statistical inference of constrained nonlinear optimization problems.
We apply the Sequential Quadratic Programming (StoSQP) method to solve these problems.
arXiv Detail & Related papers (2022-05-27T00:34:03Z) - Conditional Uncorrelation and Efficient Non-approximate Subset Selection
in Sparse Regression [72.84177488527398]
We consider sparse regression from the view of correlation, and propose the formula of conditional uncorrelation.
By the proposed method, the computational complexity is reduced from $O(frac16k3+mk2+mkd)$ to $O(frac16k3+frac12mk2)$ for each candidate subset in sparse regression.
arXiv Detail & Related papers (2020-09-08T20:32:26Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - An Online Method for A Class of Distributionally Robust Optimization
with Non-Convex Objectives [54.29001037565384]
We propose a practical online method for solving a class of online distributionally robust optimization (DRO) problems.
Our studies demonstrate important applications in machine learning for improving the robustness of networks.
arXiv Detail & Related papers (2020-06-17T20:19:25Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.