Convergence Analysis of Homotopy-SGD for non-convex optimization
- URL: http://arxiv.org/abs/2011.10298v1
- Date: Fri, 20 Nov 2020 09:50:40 GMT
- Title: Convergence Analysis of Homotopy-SGD for non-convex optimization
- Authors: Matilde Gargiani and Andrea Zanelli and Quoc Tran-Dinh and Moritz
Diehl and Frank Hutter
- Abstract summary: We present a first-order algorithm based on a combination of homotopy methods and SGD, called Gradienty-Stoch Descent (H-SGD)
Under some assumptions, we conduct a theoretical analysis of the proposed problem.
Experimental results show that H-SGD can outperform SGD.
- Score: 43.71213126039448
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: First-order stochastic methods for solving large-scale non-convex
optimization problems are widely used in many big-data applications, e.g.
training deep neural networks as well as other complex and potentially
non-convex machine learning models. Their inexpensive iterations generally come
together with slow global convergence rate (mostly sublinear), leading to the
necessity of carrying out a very high number of iterations before the iterates
reach a neighborhood of a minimizer. In this work, we present a first-order
stochastic algorithm based on a combination of homotopy methods and SGD, called
Homotopy-Stochastic Gradient Descent (H-SGD), which finds interesting
connections with some proposed heuristics in the literature, e.g. optimization
by Gaussian continuation, training by diffusion, mollifying networks. Under
some mild assumptions on the problem structure, we conduct a theoretical
analysis of the proposed algorithm. Our analysis shows that, with a
specifically designed scheme for the homotopy parameter, H-SGD enjoys a global
linear rate of convergence to a neighborhood of a minimum while maintaining
fast and inexpensive iterations. Experimental evaluations confirm the
theoretical results and show that H-SGD can outperform standard SGD.
Related papers
- Stochastic Variance-Reduced Iterative Hard Thresholding in Graph Sparsity Optimization [0.626226809683956]
We introduce two methods to solve gradient-based graph sparsity optimization: GraphRG-IHT and GraphSG-IHT.
We provide a general general for theoretical analysis, demonstrating that methods enjoy a gradient-based framework.
arXiv Detail & Related papers (2024-07-24T03:26:26Z) - Diffusion Stochastic Optimization for Min-Max Problems [33.73046548872663]
The optimistic gradient method is useful in addressing minimax optimization problems.
Motivated by the observation that the conventional version suffers from the need for a large batch size, we introduce and analyze a new formulation termed Samevareps-generativeOGOG.
arXiv Detail & Related papers (2024-01-26T01:16:59Z) - Max-affine regression via first-order methods [7.12511675782289]
The max-affine model ubiquitously arises in applications in signal processing and statistics.
We present a non-asymptotic convergence analysis of gradient descent (GD) and mini-batch gradient descent (SGD) for max-affine regression.
arXiv Detail & Related papers (2023-08-15T23:46:44Z) - Stochastic Methods in Variational Inequalities: Ergodicity, Bias and
Refinements [19.524063429548278]
Extragradient (SEG) and Gradient Descent Ascent (SGDA) are preeminent algorithms for min-max optimization and variational inequalities problems.
Our work endeavors to quantify and quantify the intrinsic structures intrinsic to these algorithms.
By recasting the constant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish a first-of-its-kind Law of Large Numbers and a Central Limit Theorem.
arXiv Detail & Related papers (2023-06-28T18:50:07Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - On the Convergence of mSGD and AdaGrad for Stochastic Optimization [0.696125353550498]
convex descent (SGD) has been intensively developed and extensively applied in machine learning in the past decade.
Some modified SGD-type algorithms, which outperform the SGD in many competitions and applications in terms of convergence rate and accuracy, such as momentum-based SGD (mSGD) and adaptive gradient optimization (AdaGrad)
We focus on convergence analysis of mSGD and AdaGrad for any smooth (possibly non-possibly non-possibly non-possibly) loss functions in machine learning.
arXiv Detail & Related papers (2022-01-26T22:02:21Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD)
We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z) - Recent Theoretical Advances in Non-Convex Optimization [56.88981258425256]
Motivated by recent increased interest in analysis of optimization algorithms for non- optimization in deep networks and other problems in data, we give an overview of recent results of theoretical optimization algorithms for non- optimization.
arXiv Detail & Related papers (2020-12-11T08:28:51Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.