On the Convergence of Stochastic Gradient Descent for Linear Inverse
Problems in Banach Spaces
- URL: http://arxiv.org/abs/2302.05197v1
- Date: Fri, 10 Feb 2023 12:00:49 GMT
- Title: On the Convergence of Stochastic Gradient Descent for Linear Inverse
Problems in Banach Spaces
- Authors: Z. Kereta and B. Jin
- Abstract summary: gradient descent (SGD) has been established as one of the most successful optimisation methods in machine learning.
We present a novel convergence analysis of SGD for linear inverse problems in general Banach spaces.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work we consider stochastic gradient descent (SGD) for solving linear
inverse problems in Banach spaces. SGD and its variants have been established
as one of the most successful optimisation methods in machine learning, imaging
and signal processing, etc. At each iteration SGD uses a single datum, or a
small subset of data, resulting in highly scalable methods that are very
attractive for large-scale inverse problems. Nonetheless, the theoretical
analysis of SGD-based approaches for inverse problems has thus far been largely
limited to Euclidean and Hilbert spaces. In this work we present a novel
convergence analysis of SGD for linear inverse problems in general Banach
spaces: we show the almost sure convergence of the iterates to the minimum norm
solution and establish the regularising property for suitable a priori stopping
criteria. Numerical results are also presented to illustrate features of the
approach.
Related papers
- Inverse Models for Estimating the Initial Condition of Spatio-Temporal
Advection-Diffusion Processes [5.814371485767541]
Inverse problems involve making inference about unknown parameters of a physical process using observational data.
This paper investigates the estimation of the initial condition of a-temporal advection-diffusion process using spatially sparse data streams.
arXiv Detail & Related papers (2023-02-08T15:30:16Z) - Clipped Stochastic Methods for Variational Inequalities with
Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs.
Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems.
In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z) - Benign Underfitting of Stochastic Gradient Descent [72.38051710389732]
We study to what extent may gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit training data.
We analyze the closely related with-replacement SGD, for which an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate.
arXiv Detail & Related papers (2022-02-27T13:25:01Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - The Benefits of Implicit Regularization from SGD in Least Squares
Problems [116.85246178212616]
gradient descent (SGD) exhibits strong algorithmic regularization effects in practice.
We make comparisons of the implicit regularization afforded by (unregularized) average SGD with the explicit regularization of ridge regression.
arXiv Detail & Related papers (2021-08-10T09:56:47Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space.
We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z) - Convergence Analysis of Homotopy-SGD for non-convex optimization [43.71213126039448]
We present a first-order algorithm based on a combination of homotopy methods and SGD, called Gradienty-Stoch Descent (H-SGD)
Under some assumptions, we conduct a theoretical analysis of the proposed problem.
Experimental results show that H-SGD can outperform SGD.
arXiv Detail & Related papers (2020-11-20T09:50:40Z) - Orthant Based Proximal Stochastic Gradient Method for
$\ell_1$-Regularized Optimization [35.236001071182855]
Sparsity-inducing regularization problems are ubiquitous in machine learning applications.
We present a novel method -- Orthanximal Gradient Method (OBProx-SG)
arXiv Detail & Related papers (2020-04-07T18:23:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.