Have ASkotch: A Neat Solution for Large-scale Kernel Ridge Regression
- URL: http://arxiv.org/abs/2407.10070v2
- Date: Fri, 21 Feb 2025 22:38:28 GMT
- Title: Have ASkotch: A Neat Solution for Large-scale Kernel Ridge Regression
- Authors: Pratik Rathore, Zachary Frangella, Jiaming Yang, Michał Dereziński, Madeleine Udell,
- Abstract summary: ASkotch is a scalable, accelerated, iterative method for full KRR that provably obtains linear convergence.<n>ASkotch outperforms state-of-the-art KRR solvers on a testbed of 23 large-scale KRR regression and classification tasks.<n>Our work opens up the possibility of as-yet-unimagined applications of full KRR across a number of disciplines.
- Score: 16.836685923503868
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Kernel ridge regression (KRR) is a fundamental computational tool, appearing in problems that range from computational chemistry to health analytics, with a particular interest due to its starring role in Gaussian process regression. However, full KRR solvers are challenging to scale to large datasets: both direct (i.e., Cholesky decomposition) and iterative methods (i.e., PCG) incur prohibitive computational and storage costs. The standard approach to scale KRR to large datasets chooses a set of inducing points and solves an approximate version of the problem, inducing points KRR. However, the resulting solution tends to have worse predictive performance than the full KRR solution. In this work, we introduce a new solver, ASkotch, for full KRR that provides better solutions faster than state-of-the-art solvers for full and inducing points KRR. ASkotch is a scalable, accelerated, iterative method for full KRR that provably obtains linear convergence. Under appropriate conditions, we show that ASkotch obtains condition-number-free linear convergence. This convergence analysis rests on the theory of ridge leverage scores and determinantal point processes. ASkotch outperforms state-of-the-art KRR solvers on a testbed of 23 large-scale KRR regression and classification tasks derived from a wide range of application domains, demonstrating the superiority of full KRR over inducing points KRR. Our work opens up the possibility of as-yet-unimagined applications of full KRR across a number of disciplines.
Related papers
- Efficient Differentiable Approximation of Generalized Low-rank Regularization [64.73416824444328]
Low-rank regularization (LRR) has been widely applied in various machine learning tasks.<n>In this paper, we propose an efficient differentiable approximation of LRR.
arXiv Detail & Related papers (2025-05-21T11:49:17Z) - Logarithmic Regret for Online KL-Regularized Reinforcement Learning [51.113248212150964]
KL-regularization plays a pivotal role in improving efficiency of RL fine-tuning for large language models.
Despite its empirical advantage, the theoretical difference between KL-regularized RL and standard RL remains largely under-explored.
We propose an optimistic-based KL-regularized online contextual bandit algorithm, and provide a novel analysis of its regret.
arXiv Detail & Related papers (2025-02-11T11:11:05Z) - Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct.
Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z) - Provably Efficient Reinforcement Learning with Multinomial Logit Function Approximation [67.8414514524356]
We study a new class of MDPs that employs multinomial logit (MNL) function approximation to ensure valid probability distributions over the state space.
introducing the non-linear function raises significant challenges in both computational and statistical efficiency.
We propose an algorithm that achieves the same regret with only $mathcalO(1)$ cost.
arXiv Detail & Related papers (2024-05-27T11:31:54Z) - Optimal Rates and Saturation for Noiseless Kernel Ridge Regression [4.585021053685196]
We present a comprehensive study of Kernel ridge regression (KRR) in the noiseless regime.
KRR is a fundamental method for learning functions from finite samples.
We introduce a refined notion of degrees of freedom, which we believe has broader applicability in the analysis of kernel methods.
arXiv Detail & Related papers (2024-02-24T04:57:59Z) - Optimal Rates of Kernel Ridge Regression under Source Condition in Large
Dimensions [15.988264513040903]
We study the large-dimensional behavior of kernel ridge regression (KRR) where the sample size $n asymp dgamma$ for some $gamma > 0$.
Our results illustrate that the curves of rate varying along $gamma$ exhibit the periodic plateau behavior and the multiple descent behavior.
arXiv Detail & Related papers (2024-01-02T16:14:35Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - Approximating a RUM from Distributions on k-Slates [88.32814292632675]
We find a generalization-time algorithm that finds the RUM that best approximates the given distribution on average.
Our theoretical result can also be made practical: we obtain a that is effective and scales to real-world datasets.
arXiv Detail & Related papers (2023-05-22T17:43:34Z) - On the Optimality of Misspecified Kernel Ridge Regression [13.995944403996566]
We show that KRR is minimax optimal for any $sin (0,1)$ when the $mathcalH$ is a Sobolev RKHS.
arXiv Detail & Related papers (2023-05-12T04:12:12Z) - Robust, randomized preconditioning for kernel ridge regression [3.521877014965197]
This paper investigates two randomized preconditioning techniques for solving kernel ridge regression problems.
It introduces two new methods with state-of-the-art performance.
The proposed methods solve a broad range of KRR problems, making them ideal for practical applications.
arXiv Detail & Related papers (2023-04-24T21:48:24Z) - Detection-Recovery Gap for Planted Dense Cycles [72.4451045270967]
We consider a model where a dense cycle with expected bandwidth $n tau$ and edge density $p$ is planted in an ErdHos-R'enyi graph $G(n,q)$.
We characterize the computational thresholds for the associated detection and recovery problems for the class of low-degree algorithms.
arXiv Detail & Related papers (2023-02-13T22:51:07Z) - Overparameterized random feature regression with nearly orthogonal data [21.97381518762387]
We study the non-asymptotic behaviors of the random feature ridge regression (RFRR) given by a two-layer neural network.
Our results hold for a wide variety of activation functions and input data sets that exhibit nearly deterministic properties.
arXiv Detail & Related papers (2022-11-11T09:16:25Z) - Distributed Random Reshuffling over Networks [7.013052033764372]
A distributed resh-upr (D-RR) algorithm is proposed to solve the problem of convex and smooth objective functions.
In particular, for smooth convex objective functions, D-RR achieves D-T convergence rate (where $T counts epoch number) in terms of distance between the global drives.
arXiv Detail & Related papers (2021-12-31T03:59:37Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - Oversampling Divide-and-conquer for Response-skewed Kernel Ridge
Regression [20.00435452480056]
We develop a novel response-adaptive partition strategy to overcome the limitation of the divide-and-conquer method.
We show the proposed estimate has a smaller mean squared error (AMSE) than that of the classical dacKRR estimate under mild conditions.
arXiv Detail & Related papers (2021-07-13T04:01:04Z) - Optimal Spectral Recovery of a Planted Vector in a Subspace [80.02218763267992]
We study efficient estimation and detection of a planted vector $v$ whose $ell_4$ norm differs from that of a Gaussian vector with the same $ell$ norm.
We show that in the regime $n rho gg sqrtN$, any spectral method from a large class (and more generally, any low-degree of the input) fails to detect the planted vector.
arXiv Detail & Related papers (2021-05-31T16:10:49Z) - Square Root Bundle Adjustment for Large-Scale Reconstruction [56.44094187152862]
We propose a new formulation for the bundle adjustment problem which relies on nullspace marginalization of landmark variables by QR decomposition.
Our approach, which we call square root bundle adjustment, is algebraically equivalent to the commonly used Schur complement trick.
We show in real-world experiments with the BAL datasets that even in single precision the proposed solver achieves on average equally accurate solutions.
arXiv Detail & Related papers (2021-03-02T16:26:20Z) - Generalization error of random features and kernel methods:
hypercontractivity and kernel matrix concentration [19.78800773518545]
We study the use of random features methods in conjunction with ridge regression in the feature space $mathbb RN$.
This can be viewed as a finite-dimensional approximation of kernel ridge regression (KRR), or as a stylized model for neural networks in the so called lazy training regime.
arXiv Detail & Related papers (2021-01-26T06:46:41Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z) - Approximation Schemes for ReLU Regression [80.33702497406632]
We consider the fundamental problem of ReLU regression.
The goal is to output the best fitting ReLU with respect to square loss given to draws from some unknown distribution.
arXiv Detail & Related papers (2020-05-26T16:26:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.