Nonconvex Stochastic Bregman Proximal Gradient Method with Application
to Deep Learning
- URL: http://arxiv.org/abs/2306.14522v2
- Date: Thu, 29 Jun 2023 09:50:10 GMT
- Title: Nonconvex Stochastic Bregman Proximal Gradient Method with Application
to Deep Learning
- Authors: Kuangyu Ding, Jingyang Li and Kim-Chuan Toh
- Abstract summary: We investigate a family of Bregman Bregman (SBPG) methods, which only require smooth approximation of differentiable part.
MSBPG can potentially be employed as an universal open-source in computation.
- Score: 6.807786746803371
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The widely used stochastic gradient methods for minimizing nonconvex
composite objective functions require the Lipschitz smoothness of the
differentiable part. But the requirement does not hold true for problem classes
including quadratic inverse problems and training neural networks. To address
this issue, we investigate a family of stochastic Bregman proximal gradient
(SBPG) methods, which only require smooth adaptivity of the differentiable
part. SBPG replaces the upper quadratic approximation used in SGD with the
Bregman proximity measure, resulting in a better approximation model that
captures the non-Lipschitz gradients of the nonconvex objective. We formulate
the vanilla SBPG and establish its convergence properties under nonconvex
setting without finite-sum structure. Experimental results on quadratic inverse
problems testify the robustness of SBPG. Moreover, we propose a momentum-based
version of SBPG (MSBPG) and prove it has improved convergence properties. We
apply MSBPG to the training of deep neural networks with a polynomial kernel
function, which ensures the smooth adaptivity of the loss function.
Experimental results on representative benchmarks demonstrate the effectiveness
and robustness of MSBPG in training neural networks. Since the additional
computation cost of MSBPG compared with SGD is negligible in large-scale
optimization, MSBPG can potentially be employed as an universal open-source
optimizer in the future.
Related papers
- Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.
Non-smooth regularization is often incorporated into machine learning tasks.
We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z) - SeWA: Selective Weight Average via Probabilistic Masking [51.015724517293236]
We show that only a few points are needed to achieve better and faster convergence.
We transform the discrete selection problem into a continuous subset optimization framework.
We derive the SeWA's stability bounds, which are sharper than that under both convex image checkpoints.
arXiv Detail & Related papers (2025-02-14T12:35:21Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation
Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points.
Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z) - Smoothing Policy Iteration for Zero-sum Markov Games [9.158672246275348]
We propose the smoothing policy robustness (SPI) algorithm to solve the zero-sum MGs approximately.
Specially, the adversarial policy is served as the weight function to enable an efficient sampling over action spaces.
We also propose a model-based algorithm called Smooth adversarial Actor-critic (SaAC) by extending SPI with the function approximations.
arXiv Detail & Related papers (2022-12-03T14:39:06Z) - Gradient-Free Methods for Deterministic and Stochastic Nonsmooth
Nonconvex Optimization [94.19177623349947]
Non-smooth non optimization problems emerge in machine learning and business making.
Two core challenges impede the development of efficient methods with finitetime convergence guarantee.
Two-phase versions of GFM and SGFM are also proposed and proven to achieve improved large-deviation results.
arXiv Detail & Related papers (2022-09-12T06:53:24Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - GOALS: Gradient-Only Approximations for Line Searches Towards Robust and
Consistent Training of Deep Neural Networks [0.0]
Mini-batch sub-sampling (MBSS) is favored in deep neural network training to reduce the computational cost.
We propose a gradient-only approximation line search (GOALS) with strong convergence characteristics with defined optimality criterion.
arXiv Detail & Related papers (2021-05-23T11:21:01Z) - A Retrospective Approximation Approach for Smooth Stochastic
Optimization [0.2867517731896504]
Gradient (SG) is the defactorandom iterative technique to solve optimization (SO) problems with a smooth (non-fimation) objective $imation.
arXiv Detail & Related papers (2021-03-07T16:29:36Z) - Global Convergence of Model Function Based Bregman Proximal Minimization
Algorithms [17.740376367999705]
Lipschitz mapping of a continuously differentiable function plays a crucial role in various optimization algorithms.
We propose a globally convergent algorithm called Model $$L$mad property.
arXiv Detail & Related papers (2020-12-24T08:09:22Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Bayesian Sparse learning with preconditioned stochastic gradient MCMC
and its applications [5.660384137948734]
The proposed algorithm converges to the correct distribution with a controllable bias under mild conditions.
We show that the proposed algorithm canally converge to the correct distribution with a controllable bias under mild conditions.
arXiv Detail & Related papers (2020-06-29T20:57:20Z) - Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses [52.039438701530905]
We provide sharp upper and lower bounds for several forms of gradient descent (SGD) on arbitrary Lipschitz nonsmooth convex losses.
Our bounds allow us to derive a new algorithm for differentially private nonsmooth convex optimization with optimal excess population risk.
arXiv Detail & Related papers (2020-06-12T02:45:21Z) - Stochastic Proximal Gradient Algorithm with Minibatches. Application to
Large Scale Learning Models [2.384873896423002]
We develop and analyze minibatch variants of gradient algorithm for general composite objective functions with nonsmooth components.
We provide complexity for constant and variable stepsize iteration policies obtaining that, for minibatch size $N$, after $mathcalO(frac1Nepsilon)$ $epsilon-$subity is attained in expected quadratic distance to optimal solution.
arXiv Detail & Related papers (2020-03-30T10:43:56Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.