Optimal Rates of Distributed Regression with Imperfect Kernels
- URL: http://arxiv.org/abs/2006.16744v1
- Date: Tue, 30 Jun 2020 13:00:16 GMT
- Title: Optimal Rates of Distributed Regression with Imperfect Kernels
- Authors: Hongwei Sun (University of Jinan) and Qiang Wu (Middle Tennessee State
University)
- Abstract summary: We study the distributed kernel regression via the divide conquer and conquer approach.
We show that the kernel ridge regression can achieve rates faster than $N-1$ in the noise free setting.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distributed machine learning systems have been receiving increasing
attentions for their efficiency to process large scale data. Many distributed
frameworks have been proposed for different machine learning tasks. In this
paper, we study the distributed kernel regression via the divide and conquer
approach. This approach has been proved asymptotically minimax optimal if the
kernel is perfectly selected so that the true regression function lies in the
associated reproducing kernel Hilbert space. However, this is usually, if not
always, impractical because kernels that can only be selected via prior
knowledge or a tuning process are hardly perfect. Instead it is more common
that the kernel is good enough but imperfect in the sense that the true
regression can be well approximated by but does not lie exactly in the kernel
space. We show distributed kernel regression can still achieves capacity
independent optimal rate in this case. To this end, we first establish a
general framework that allows to analyze distributed regression with response
weighted base algorithms by bounding the error of such algorithms on a single
data set, provided that the error bounds has factored the impact of the
unexplained variance of the response variable. Then we perform a leave one out
analysis of the kernel ridge regression and bias corrected kernel ridge
regression, which in combination with the aforementioned framework allows us to
derive sharp error bounds and capacity independent optimal rates for the
associated distributed kernel regression algorithms. As a byproduct of the
thorough analysis, we also prove the kernel ridge regression can achieve rates
faster than $N^{-1}$ (where $N$ is the sample size) in the noise free setting
which, to our best knowledge, are first observed and novel in regression
learning.
Related papers
- Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning [33.34053480377887]
This paper enhances kernel ridgeless regression with Locally-Adaptive-Bandwidths (LAB) RBF kernels.
For the first time, we demonstrate that functions learned from LAB RBF kernels belong to an integral space of Reproducible Kernel Hilbert Spaces (RKHSs)
arXiv Detail & Related papers (2024-06-03T15:28:12Z) - Generalization in Kernel Regression Under Realistic Assumptions [41.345620270267446]
We provide rigorous bounds for common kernels and for any amount of regularization, noise, any input dimension, and any number of samples.
Our results imply benign overfitting in high input dimensions, nearly tempered overfitting in fixed dimensions, and explicit convergence rates for regularized regression.
As a by-product, we obtain time-dependent bounds for neural networks trained in the kernel regime.
arXiv Detail & Related papers (2023-12-26T10:55:20Z) - Solving Kernel Ridge Regression with Gradient Descent for a Non-Constant Kernel [1.5229257192293204]
KRR is a generalization of linear ridge regression that is non-linear in the data, but linear in the parameters.
We address the effects of changing the kernel during training, something that is investigated in this paper.
We show theoretically and empirically that using a decreasing bandwidth, we are able to achieve both zero training error in combination with good generalization, and a double descent behavior.
arXiv Detail & Related papers (2023-11-03T07:43:53Z) - Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels [57.46832672991433]
We propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS)
We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises.
We develop an expectation-propagation expectation-maximization algorithm for efficient posterior inference and function estimation.
arXiv Detail & Related papers (2023-10-09T03:55:09Z) - Improved learning theory for kernel distribution regression with
two-stage sampling [3.154269505086155]
kernel methods have become a method of choice to tackle the distribution regression problem.
We introduce the novel near-unbiased condition on the Hilbertian embeddings, that enables us to provide new error bounds.
We show that this near-unbiased condition holds for three important classes of kernels, based on optimal transport and mean embedding.
arXiv Detail & Related papers (2023-08-28T06:29:09Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Improved Convergence Rates for Sparse Approximation Methods in
Kernel-Based Learning [48.08663378234329]
Kernel-based models such as kernel ridge regression and Gaussian processes are ubiquitous in machine learning applications.
Existing sparse approximation methods can yield a significant reduction in the computational cost.
We provide novel confidence intervals for the Nystr"om method and the sparse variational Gaussian processes approximation method.
arXiv Detail & Related papers (2022-02-08T17:22:09Z) - Kernel Continual Learning [117.79080100313722]
kernel continual learning is a simple but effective variant of continual learning to tackle catastrophic forgetting.
episodic memory unit stores a subset of samples for each task to learn task-specific classifiers based on kernel ridge regression.
variational random features to learn a data-driven kernel for each task.
arXiv Detail & Related papers (2021-07-12T22:09:30Z) - Taming Nonconvexity in Kernel Feature Selection---Favorable Properties
of the Laplace Kernel [77.73399781313893]
A challenge is to establish the objective function of kernel-based feature selection.
The gradient-based algorithms available for non-global optimization are only able to guarantee convergence to local minima.
arXiv Detail & Related papers (2021-06-17T11:05:48Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Towards Unbiased Random Features with Lower Variance For Stationary
Indefinite Kernels [26.57122949130266]
Our algorithm achieves lower variance and approximation error compared with the existing kernel approximation methods.
With better approximation to the originally selected kernels, improved classification accuracy and regression ability is obtained.
arXiv Detail & Related papers (2021-04-13T13:56:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.