Computational Efficiency under Covariate Shift in Kernel Ridge Regression
- URL: http://arxiv.org/abs/2505.14083v1
- Date: Tue, 20 May 2025 08:41:24 GMT
- Title: Computational Efficiency under Covariate Shift in Kernel Ridge Regression
- Authors: Andrea Della Vecchia, Arnaud Mavakala Watusadisi, Ernesto De Vito, Lorenzo Rosasco,
- Abstract summary: We show that significant computational savings can be achieved without compromising learning performance.<n>We investigate the use of random projections where the hypothesis space consists of a random subspace within a given RKHS.
- Score: 13.02719615069661
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper addresses the covariate shift problem in the context of nonparametric regression within reproducing kernel Hilbert spaces (RKHSs). Covariate shift arises in supervised learning when the input distributions of the training and test data differ, presenting additional challenges for learning. Although kernel methods have optimal statistical properties, their high computational demands in terms of time and, particularly, memory, limit their scalability to large datasets. To address this limitation, the main focus of this paper is to explore the trade-off between computational efficiency and statistical accuracy under covariate shift. We investigate the use of random projections where the hypothesis space consists of a random subspace within a given RKHS. Our results show that, even in the presence of covariate shift, significant computational savings can be achieved without compromising learning performance.
Related papers
- MindFlayer SGD: Efficient Parallel SGD in the Presence of Heterogeneous and Random Worker Compute Times [49.1574468325115]
We investigate the problem of minimizing the expectation of smooth non functions in a setting with multiple parallel workers that are able to compute optimal gradients.<n>A challenge in this context is the presence of arbitrarily heterogeneous and distributed compute times.<n>We introduce MindFlayer SGD, a novel parallel SGD method specifically designed to handle this gap.
arXiv Detail & Related papers (2024-10-05T21:11:32Z) - Minimum Reduced-Order Models via Causal Inference [2.300302733934937]
We study an efficient approach to identifying sparse ROMs using an information-theoretic indicator called causation entropy.<n>We show that a Gaussian approximation of the causation entropy still performs exceptionally well even in presence of highly non-Gaussian statistics.<n>We also demonstrate good performance of the obtained ROMs in recovering unobserved dynamics via data assimilation with partial observations.
arXiv Detail & Related papers (2024-06-29T01:24:41Z) - Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework [56.82432591933544]
Distributed gradient descent (SGD) has attracted considerable recent attention due to its potential for scaling computational resources, reducing training time, and helping protect user privacy in machine learning.<n>This paper presents the run time and staleness of distributed SGD based on delay differential equations (SDDEs) and the approximation of gradient arrivals.<n>It is interestingly shown that increasing the number of activated workers does not necessarily accelerate distributed SGD due to staleness.
arXiv Detail & Related papers (2024-06-17T02:56:55Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Efficient Large-scale Nonstationary Spatial Covariance Function
Estimation Using Convolutional Neural Networks [3.5455896230714194]
We use ConvNets to derive subregions from the nonstationary data.
We employ a selection mechanism to identify subregions that exhibit similar behavior to stationary fields.
We assess the performance of the proposed method with synthetic and real datasets at a large scale.
arXiv Detail & Related papers (2023-06-20T12:17:46Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Convolutional generative adversarial imputation networks for
spatio-temporal missing data in storm surge simulations [86.5302150777089]
Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods.
We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
arXiv Detail & Related papers (2021-11-03T03:50:48Z) - Scalable Spatiotemporally Varying Coefficient Modelling with Bayesian Kernelized Tensor Regression [17.158289775348063]
Kernelized tensor Regression (BKTR) can be considered a new and scalable approach to modeling processes with low-rank cotemporal structure.
We conduct extensive experiments on both synthetic and real-world data sets, and our results confirm the superior performance and efficiency of BKTR for model estimation and inference.
arXiv Detail & Related papers (2021-08-31T19:22:23Z) - ParK: Sound and Efficient Kernel Ridge Regression by Feature Space
Partitions [34.576469570537995]
We introduce ParK, a new large-scale solver for kernel ridge regression.
Our approach combines partitioning with random projections and iterative optimization to reduce space and time complexity.
arXiv Detail & Related papers (2021-06-23T08:24:36Z) - Adaptive Local Kernels Formulation of Mutual Information with
Application to Active Post-Seismic Building Damage Inference [1.066048003460524]
Post-earthquake regional damage assessment of buildings is an expensive task.
The information theoretic measure of mutual information is one of the most effective criteria to evaluate the effectiveness of the samples.
A local kernels strategy was proposed to reduce the computational costs, but the adaptability of the kernels to the observed labels was not considered.
In this article, an adaptive local kernels methodology is developed that allows for the conformability of the kernels to the observed output data.
arXiv Detail & Related papers (2021-05-24T18:34:46Z) - Real-Time Regression with Dividing Local Gaussian Processes [62.01822866877782]
Local Gaussian processes are a novel, computationally efficient modeling approach based on Gaussian process regression.
Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice.
A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
arXiv Detail & Related papers (2020-06-16T18:43:31Z) - Fast Estimation of Information Theoretic Learning Descriptors using
Explicit Inner Product Spaces [4.5497405861975935]
Kernel methods form a theoretically-grounded, powerful and versatile framework to solve nonlinear problems in signal processing and machine learning.
Recently, we proposed emphno-trick (NT) kernel adaptive filtering (KAF)
We focus on a family of fast, scalable, and accurate estimators for ITL using explicit inner product space kernels.
arXiv Detail & Related papers (2020-01-01T20:21:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.