Related papers: Stochastic Gradient Descent for Additive Nonparametric Regression

Stochastic Gradient Descent for Additive Nonparametric Regression

URL: http://arxiv.org/abs/2401.00691v2
Date: Tue, 13 Feb 2024 13:40:53 GMT
Title: Stochastic Gradient Descent for Additive Nonparametric Regression
Authors: Xin Chen and Jason M. Klusowski
Abstract summary: This paper introduces an iterative algorithm for training additive models that enjoys favorable memory storage and computational requirements. We show that the resulting estimator satisfies an inequality that allows for model mis-specification. In the well-specified setting, by choosing the learning rate carefully across three distinct stages of training, we demonstrate that its risk is minimax optimal in terms of the dependence on the dimensionality of the data and the size of the training sample.
Score: 13.28914458950716
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces an iterative algorithm for training additive models that enjoys favorable memory storage and computational requirements. The algorithm can be viewed as the functional counterpart of stochastic gradient descent, applied to the coefficients of a truncated basis expansion of the component functions. We show that the resulting estimator satisfies an oracle inequality that allows for model mis-specification. In the well-specified setting, by choosing the learning rate carefully across three distinct stages of training, we demonstrate that its risk is minimax optimal in terms of the dependence on the dimensionality of the data and the size of the training sample. We further illustrate the computational benefits by comparing the approach with traditional backfitting on two real-world datasets.

Related papers

The Power of Random Features and the Limits of Distribution-Free Gradient Descent [14.742677437485273]
We study the relationship between gradient-based optimization of parametric models (e.g., neural networks) and optimization of linear combinations of random features.<n>Our main result shows that if a parametric model can be learned using mini-batch gradient descent (bSGD) without making assumptions about the data distribution, then with high probability, the target function can also be approximated.
arXiv Detail & Related papers (2025-05-15T15:39:28Z)
Gradient-free stochastic optimization for additive models [56.42455605591779]
We address the problem of zero-order optimization from noisy observations for an objective function satisfying the Polyak-Lojasiewicz or the strong convexity condition.<n>We assume that the objective function has an additive structure and satisfies a higher-order smoothness property, characterized by the H"older family of functions.
arXiv Detail & Related papers (2025-03-03T23:39:08Z)
Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
Branch and Bound to Assess Stability of Regression Coefficients in Uncertain Models [0.6990493129893112]
We introduce our algorithm, along with supporting mathematical results, an example application, and a link to our computer code. It helps researchers summarize high-dimensional data and assess the stability of regression coefficients in uncertain models.
arXiv Detail & Related papers (2024-08-19T01:37:14Z)
Statistical Optimality of Divide and Conquer Kernel-based Functional Linear Regression [1.7227952883644062]
This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not reside in the underlying kernel space. As a decomposition-based scalable approach, the divide-and-conquer estimators of functional linear regression can substantially reduce the algorithmic complexities in time and memory.
arXiv Detail & Related papers (2022-11-20T12:29:06Z)
Adaptive LASSO estimation for functional hidden dynamic geostatistical model [69.10717733870575]
We propose a novel model selection algorithm based on a penalized maximum likelihood estimator (PMLE) for functional hiddenstatistical models (f-HD) The algorithm is based on iterative optimisation and uses an adaptive least absolute shrinkage and selector operator (GMSOLAS) penalty function, wherein the weights are obtained by the unpenalised f-HD maximum-likelihood estimators.
arXiv Detail & Related papers (2022-08-10T19:17:45Z)
Benign overfitting and adaptive nonparametric regression [71.70323672531606]
We construct an estimator which is a continuous function interpolating the data points with high probability. We attain minimax optimal rates under mean squared risk on the scale of H"older classes adaptively to the unknown smoothness.
arXiv Detail & Related papers (2022-06-27T14:50:14Z)
Sobolev Acceleration and Statistical Optimality for Learning Elliptic Equations via Gradient Descent [11.483919798541393]
We study the statistical limits in terms of Sobolev norms of gradient descent for solving inverse problem from randomly sampled noisy observations. Our class of objective functions includes Sobolev training for kernel regression, Deep Ritz Methods (DRM), and Physics Informed Neural Networks (PINN)
arXiv Detail & Related papers (2022-05-15T17:01:53Z)
Adaptive estimation of a function from its Exponential Radon Transform in presence of noise [0.0]
We propose a locally adaptive strategy for estimating a function from its Exponential Radon Transform (ERT) data. We build a non-parametric kernel type estimator and show that for a class of functions comprising a wide Sobolev regularity scale, our proposed strategy follows the minimax optimal rate up to a $logn$ factor.
arXiv Detail & Related papers (2020-11-13T12:54:09Z)
Support estimation in high-dimensional heteroscedastic mean regression [2.28438857884398]
We consider a linear mean regression model with random design and potentially heteroscedastic, heavy-tailed errors. We use a strictly convex, smooth variant of the Huber loss function with tuning parameter depending on the parameters of the problem. For the resulting estimator we show sign-consistency and optimal rates of convergence in the $ell_infty$ norm.
arXiv Detail & Related papers (2020-11-03T09:46:31Z)
Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables. The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z)
Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples. We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z)
Asymptotic Analysis of an Ensemble of Randomly Projected Linear Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets. We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator. We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.