$\bar{G}_{mst}$:An Unbiased Stratified Statistic and a Fast Gradient
Optimization Algorithm Based on It
- URL: http://arxiv.org/abs/2110.03354v1
- Date: Thu, 7 Oct 2021 11:48:55 GMT
- Title: $\bar{G}_{mst}$:An Unbiased Stratified Statistic and a Fast Gradient
Optimization Algorithm Based on It
- Authors: Aixiang Chen
- Abstract summary: A novel algorithm named MSSG designed based on $barG_mst$ outperforms other sgd-like algorithms.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: -The fluctuation effect of gradient expectation and variance caused by
parameter update between consecutive iterations is neglected or confusing by
current mainstream gradient optimization algorithms. The work in this paper
remedy this issue by introducing a novel unbiased stratified statistic \
$\bar{G}_{mst}$\ , a sufficient condition of fast convergence for \
$\bar{G}_{mst}$\ also is established. A novel algorithm named MSSG designed
based on \ $\bar{G}_{mst}$\ outperforms other sgd-like algorithms. Theoretical
conclusions and experimental evidence strongly suggest to employ MSSG when
training deep model.
Related papers
- Uncertainty quantification for iterative algorithms in linear models with application to early stopping [4.150180443030652]
This paper investigates the iterates $hbb1,dots,hbbT$ obtained from iterative algorithms in high-dimensional linear regression problems.
The analysis and proposed estimators are applicable to Gradient Descent (GD), GD and their accelerated variants such as Fast Iterative Soft-Thresholding (FISTA)
arXiv Detail & Related papers (2024-04-27T10:20:41Z) - Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework.
To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework.
We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z) - Regret Bounds for Expected Improvement Algorithms in Gaussian Process
Bandit Optimization [63.8557841188626]
The expected improvement (EI) algorithm is one of the most popular strategies for optimization under uncertainty.
We propose a variant of EI with a standard incumbent defined via the GP predictive mean.
We show that our algorithm converges, and achieves a cumulative regret bound of $mathcal O(gamma_TsqrtT)$.
arXiv Detail & Related papers (2022-03-15T13:17:53Z) - MSTGD:A Memory Stochastic sTratified Gradient Descent Method with an
Exponential Convergence Rate [0.0]
This paper designs a novel underlineMemory underlineStochastic sunderlineTratified Gradient Descend(underlineMSTGD) algorithm with an exponential convergence rate.
Specifically, MSTGD uses two strategies for variance reduction: the first strategy is to perform variance reduction according to the proportion p of used historical gradient.
Unlike most other algorithms that claim to achieve an exponential convergence rate, the convergence rate is independent of parameters such as dataset size N, batch size n, etc.
arXiv Detail & Related papers (2022-02-21T01:36:26Z) - Bregman Gradient Policy Optimization [97.73041344738117]
We design a Bregman gradient policy optimization for reinforcement learning based on Bregman divergences and momentum techniques.
VR-BGPO reaches the best complexity $tilde(epsilon-3)$ for finding an $epsilon$stationary point only requiring one trajectory at each iteration.
arXiv Detail & Related papers (2021-06-23T01:08:54Z) - Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product.
We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z) - On the Last Iterate Convergence of Momentum Methods [32.60494006038902]
We prove for the first time that for any constant momentum factor, there exists a Lipschitz and convex function for which the last iterate suffers from an error.
We show that in the setting with convex and smooth functions, our new SGDM algorithm automatically converges at a rate of $O(fraclog TT)$.
arXiv Detail & Related papers (2021-02-13T21:16:16Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in
the Presence of Stragglers [31.309253646602933]
We consider the setting where a master wants to run a distributed gradient descent (SGD) algorithm on $n$ workers each having a subset of the data.
Distributed SGD may suffer from the effect of stragglers, i.e., slow or unresponsive workers who cause delays.
We propose an algorithm for adaptive distributed SGD that is based on a statistical notion.
arXiv Detail & Related papers (2020-02-25T16:25:22Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.