Learning Operators with Stochastic Gradient Descent in General Hilbert
Spaces
- URL: http://arxiv.org/abs/2402.04691v2
- Date: Tue, 13 Feb 2024 08:06:44 GMT
- Title: Learning Operators with Stochastic Gradient Descent in General Hilbert
Spaces
- Authors: Lei Shi and Jia-Qi Yang
- Abstract summary: This study investigates leveraging gradient descent (SGD) to learn operators between general Hilbert spaces.
We establish upper bounds for convergence rates of the SGD algorithm and conduct a minimax lower bound analysis.
Applying our analysis to operator learning problems based on vector-valued and real-valued kernel Hilbert spaces yields new convergence results.
- Score: 6.690174942973101
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study investigates leveraging stochastic gradient descent (SGD) to learn
operators between general Hilbert spaces. We propose weak and strong regularity
conditions for the target operator to depict its intrinsic structure and
complexity. Under these conditions, we establish upper bounds for convergence
rates of the SGD algorithm and conduct a minimax lower bound analysis, further
illustrating that our convergence analysis and regularity conditions
quantitatively characterize the tractability of solving operator learning
problems using the SGD algorithm. It is crucial to highlight that our
convergence analysis is still valid for nonlinear operator learning. We show
that the SGD estimator will converge to the best linear approximation of the
nonlinear target operator. Moreover, applying our analysis to operator learning
problems based on vector-valued and real-valued reproducing kernel Hilbert
spaces yields new convergence results, thereby refining the conclusions of
existing literature.
Related papers
- Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Understanding the Generalization Ability of Deep Learning Algorithms: A
Kernelized Renyi's Entropy Perspective [11.255943520955764]
We propose a novel information theoretical measure: kernelized Renyi's entropy.
We establish the generalization error bounds for gradient/Langevin descent (SGD/SGLD) learning algorithms under kernelized Renyi's entropy.
We show that our information-theoretical bounds depend on the statistics of the gradients, and are rigorously tighter than the current state-of-the-art (SOTA) results.
arXiv Detail & Related papers (2023-05-02T01:17:15Z) - On the Convergence of Stochastic Gradient Descent for Linear Inverse
Problems in Banach Spaces [0.0]
gradient descent (SGD) has been established as one of the most successful optimisation methods in machine learning.
We present a novel convergence analysis of SGD for linear inverse problems in general Banach spaces.
arXiv Detail & Related papers (2023-02-10T12:00:49Z) - Statistical Optimality of Divide and Conquer Kernel-based Functional
Linear Regression [1.7227952883644062]
This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not reside in the underlying kernel space.
As a decomposition-based scalable approach, the divide-and-conquer estimators of functional linear regression can substantially reduce the algorithmic complexities in time and memory.
arXiv Detail & Related papers (2022-11-20T12:29:06Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Benign Underfitting of Stochastic Gradient Descent [72.38051710389732]
We study to what extent may gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit training data.
We analyze the closely related with-replacement SGD, for which an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate.
arXiv Detail & Related papers (2022-02-27T13:25:01Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Convergence Rates for Learning Linear Operators from Noisy Data [6.4423565043274795]
We study the inverse problem of learning a linear operator on a space from its noisy pointwise evaluations on random input data.
We establish posterior contraction rates with respect to a family of Bochner norms as the number of data tend to infinity lower on the estimation error.
These convergence rates highlight and quantify the difficulty of learning linear operators in comparison with the learning of bounded or compact ones.
arXiv Detail & Related papers (2021-08-27T22:09:53Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - Fine-Grained Analysis of Stability and Generalization for Stochastic
Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates.
This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting.
To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.