Stochastic Proximal Gradient Algorithm with Minibatches. Application to
Large Scale Learning Models
- URL: http://arxiv.org/abs/2003.13332v1
- Date: Mon, 30 Mar 2020 10:43:56 GMT
- Title: Stochastic Proximal Gradient Algorithm with Minibatches. Application to
Large Scale Learning Models
- Authors: Andrei Patrascu, Ciprian Paduraru, Paul Irofti
- Abstract summary: We develop and analyze minibatch variants of gradient algorithm for general composite objective functions with nonsmooth components.
We provide complexity for constant and variable stepsize iteration policies obtaining that, for minibatch size $N$, after $mathcalO(frac1Nepsilon)$ $epsilon-$subity is attained in expected quadratic distance to optimal solution.
- Score: 2.384873896423002
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic optimization lies at the core of most statistical learning models.
The recent great development of stochastic algorithmic tools focused
significantly onto proximal gradient iterations, in order to find an efficient
approach for nonsmooth (composite) population risk functions. The complexity of
finding optimal predictors by minimizing regularized risk is largely understood
for simple regularizations such as $\ell_1/\ell_2$ norms. However, more complex
properties desired for the predictor necessitates highly difficult regularizers
as used in grouped lasso or graph trend filtering. In this chapter we develop
and analyze minibatch variants of stochastic proximal gradient algorithm for
general composite objective functions with stochastic nonsmooth components. We
provide iteration complexity for constant and variable stepsize policies
obtaining that, for minibatch size $N$, after
$\mathcal{O}(\frac{1}{N\epsilon})$ iterations $\epsilon-$suboptimality is
attained in expected quadratic distance to optimal solution. The numerical
tests on $\ell_2-$regularized SVMs and parametric sparse representation
problems confirm the theoretical behaviour and surpasses minibatch SGD
performance.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Random Scaling and Momentum for Non-smooth Non-convex Optimization [38.443430569753026]
Training neural networks requires a loss function that may be highly irregular, and in particular neither convex nor smooth.
Popular training algorithms are based on gradient descent with momentum (SGDM), for which analysis applies only if the loss is either convex or smooth.
arXiv Detail & Related papers (2024-05-16T00:52:03Z) - Nonconvex Stochastic Bregman Proximal Gradient Method for Nonconvex Composite Problems [9.202586157819693]
gradient methods for non composite objective functions typically rely on the Lipschitz smoothness of the differentiable part.
We propose a better approximation model that handles non-Lipschitz gradient in non objectives.
We show it achieves optimal robustness in terms of step selection sensitivity.
arXiv Detail & Related papers (2023-06-26T08:54:46Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms
for Stochastic Shortest Path [29.289190242826688]
We introduce a generic template for developing regret algorithms in the Shortest Path (SSP) model.
We develop two new algorithms: the first is model-free and minimax optimal under strictly positive costs.
Both algorithms admit highly sparse updates, making them computationally more efficient than all existing algorithms.
arXiv Detail & Related papers (2021-06-15T19:15:17Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Stochastic Multi-level Composition Optimization Algorithms with
Level-Independent Convergence Rates [12.783783498844022]
We study smooth multi-level composition optimization problems, where the objective function is a nested composition of $T$ functions.
We show that the first algorithm, which is a generalization of citeGhaRuswan20 to the $T$ level case, can achieve a sample complexity of $mathcalO (1/epsilon$6)
This is the first time that such an online algorithm designed for the (un) multi-level setting, obtains the same sample complexity under standard assumptions.
arXiv Detail & Related papers (2020-08-24T15:57:50Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.