Related papers: A Bound on the Maximal Marginal Degrees of Freedom

A Bound on the Maximal Marginal Degrees of Freedom

URL: http://arxiv.org/abs/2402.12885v2
Date: Mon, 06 Jan 2025 12:33:24 GMT
Title: A Bound on the Maximal Marginal Degrees of Freedom
Authors: Paul Dommel,
Abstract summary: This paper addresses low rank approximations and surrogates for kernel ridge regression.<n>We demonstrate that the computational cost of the most popular low rank approach, which is the Nystr"om method, is almost linear in the sample size.<n>The result builds on a thorough theoretical analysis of the approximation of elementary kernel functions by elements in the range of the associated integral operator.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Kernel ridge regression, in general, is expensive in memory allocation and computation time. This paper addresses low rank approximations and surrogates for kernel ridge regression, which bridge these difficulties. The fundamental contribution of the paper is a lower bound on the minimal rank such that the prediction power of the approximation remains reliable. Based on this bound, we demonstrate that the computational cost of the most popular low rank approach, which is the Nystr\"om method, is almost linear in the sample size. This justifies the method from a theoretical point of view. Moreover, the paper provides a significant extension of the feasible choices of the regularization parameter. The result builds on a thorough theoretical analysis of the approximation of elementary kernel functions by elements in the range of the associated integral operator. We provide estimates of the approximation error and characterize the behavior of the norm of the underlying weight function.

Related papers

Riemannian Federated Learning via Averaging Gradient Stream [8.75592575216789]
This paper develops and analyzes an efficient Federated Averaging Gradient Stream (RFedAGS) algorithm. Numerical simulations conducted on synthetic and real-world data demonstrate the performance of the proposed RFedAGS.
arXiv Detail & Related papers (2024-09-11T12:28:42Z)
Incremental Quasi-Newton Methods with Faster Superlinear Convergence Rates [50.36933471975506]
We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian. The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate. This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.
arXiv Detail & Related papers (2024-02-04T05:54:51Z)
Gaussian Process Regression under Computational and Epistemic Misspecification [4.5656369638728656]
In large data applications, computational costs can be reduced using low-rank or sparse approximations of the kernel. This paper investigates the effect of such kernel approximations on the element error.
arXiv Detail & Related papers (2023-12-14T18:53:32Z)
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data. For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z)
Statistical Optimality of Divide and Conquer Kernel-based Functional Linear Regression [1.7227952883644062]
This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not reside in the underlying kernel space. As a decomposition-based scalable approach, the divide-and-conquer estimators of functional linear regression can substantially reduce the algorithmic complexities in time and memory.
arXiv Detail & Related papers (2022-11-20T12:29:06Z)
Tensor-on-Tensor Regression: Riemannian Optimization, Over-parameterization, Statistical-computational Gap, and Their Interplay [9.427635404752936]
We study the tensor-on-tensor regression, where the goal is to connect tensor responses to tensor covariates with a low Tucker rank parameter/matrix. We propose two methods to cope with the challenge of unknown rank. We provide the first convergence guarantee for the general tensor-on-tensor regression.
arXiv Detail & Related papers (2022-06-17T13:15:27Z)
Nystr\"om Kernel Mean Embeddings [92.10208929236826]
We propose an efficient approximation procedure based on the Nystr"om method. It yields sufficient conditions on the subsample size to obtain the standard $n-1/2$ rate. We discuss applications of this result for the approximation of the maximum mean discrepancy and quadrature rules.
arXiv Detail & Related papers (2022-01-31T08:26:06Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets. Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z)
Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically. This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression. We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)
Understanding Implicit Regularization in Over-Parameterized Single Index Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model. We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)
Approximation Algorithms for Sparse Principal Component Analysis [57.5357874512594]
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and statistics. Various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis. We present thresholding as a provably accurate, time, approximation algorithm for the SPCA problem.
arXiv Detail & Related papers (2020-06-23T04:25:36Z)
Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in High Dimensions [41.7567932118769]
Empirical Risk Minimization algorithms are widely used in a variety of estimation and prediction tasks. In this paper, we characterize for the first time the fundamental limits on the statistical accuracy of convex ERM for inference.
arXiv Detail & Related papers (2020-06-16T04:27:38Z)
SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features. We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
Support recovery and sup-norm convergence rates for sparse pivotal estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level. We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)
Minimax Semiparametric Learning With Approximate Sparsity [3.2116198597240846]
This paper is about the feasibility and means of root-n consistently estimating linear, mean-square continuous functionals of a high dimensional, approximately sparse regression. We give lower bounds on the convergence rate of estimators of a regression slope and an average derivative. We also give debiased machine learners that are root-n consistent under either a minimal approximate sparsity condition or rate double robustness.
arXiv Detail & Related papers (2019-12-27T16:13:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.