A variable metric mini-batch proximal stochastic recursive gradient
algorithm with diagonal Barzilai-Borwein stepsize
- URL: http://arxiv.org/abs/2010.00817v1
- Date: Fri, 2 Oct 2020 07:18:42 GMT
- Title: A variable metric mini-batch proximal stochastic recursive gradient
algorithm with diagonal Barzilai-Borwein stepsize
- Authors: Tengteng Yu, Xin-Wei Liu, Yu-Hong Dai and Jie Sun
- Abstract summary: We propose a variable metric mini-batch proximal gradient algorithm VM-mSRGBB, which updates the metric using a new diagonal BB stepsize.
The linear convergence of VM-mSRGBB is established for strongly convex, non-strongly convex and convex functions.
- Score: 6.358720570480498
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Variable metric proximal gradient methods with different metric selections
have been widely used in composite optimization. Combining the Barzilai-Borwein
(BB) method with a diagonal selection strategy for the metric, the diagonal BB
stepsize can keep low per-step computation cost as the scalar BB stepsize and
better capture the local geometry of the problem. In this paper, we propose a
variable metric mini-batch proximal stochastic recursive gradient algorithm
VM-mSRGBB, which updates the metric using a new diagonal BB stepsize. The
linear convergence of VM-mSRGBB is established for strongly convex,
non-strongly convex and convex functions. Numerical experiments on standard
data sets show that VM-mSRGBB is better than or comparable to some variance
reduced stochastic gradient methods with best-tuned scalar stepsizes or BB
stepsizes. Furthermore, the performance of VM-mSRGBB is superior to some
advanced mini-batch proximal stochastic gradient methods.
Related papers
- Robust Approximate Sampling via Stochastic Gradient Barker Dynamics [0.0]
We introduce the Barker gradient dynamics (SGBD) algorithm, a robust alternative to Langevin-based sampling algorithms, to the gradient framework.
We characterize the impact of gradients on the Barker transition mechanism and develop a bias-corrected version that eliminates the error due to the gradient noise in the proposal.
arXiv Detail & Related papers (2024-05-14T23:47:02Z) - Variable Substitution and Bilinear Programming for Aligning Partially Overlapping Point Sets [48.1015832267945]
This research presents a method to meet requirements through the minimization objective function of the RPM algorithm.
A branch-and-bound (BnB) algorithm is devised, which solely branches over the parameters, thereby boosting convergence rate.
Empirical evaluations demonstrate better robustness of the proposed methodology against non-rigid deformation, positional noise, and outliers, when compared with prevailing state-of-the-art transformations.
arXiv Detail & Related papers (2024-05-14T13:28:57Z) - AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size [42.84471753630676]
We present a novel adaptation of the Gradient Descent (SGD) called AdaBatchGrad.
It seamlessly integrates an adaptive step size with an adjustable batch size.
We experimentally show, how the introduction of adaptive step size and adaptive batch size gradually improves the performance of regular SGD.
arXiv Detail & Related papers (2024-02-07T21:19:05Z) - Beyond the Golden Ratio for Variational Inequality Algorithms [12.470097382737933]
We improve the understanding of the $textitgolden ratio algorithm$, which solves monotone variational inequalities (VI) and convex-concave min-max problems.
We introduce a new analysis that allows to use larger step sizes, to complete the bridge between the golden ratio algorithm and the existing algorithms in the literature.
arXiv Detail & Related papers (2022-12-28T16:58:48Z) - A Stochastic Variance Reduced Gradient using Barzilai-Borwein Techniques
as Second Order Information [0.0]
We consider to improve the gradient variance reduce (SVRG) method via incorporating the curvature information of the objective function.
We propose to reduce the variance of gradients using the computationally efficient Barzilai-Borwein (BB) method by incorporating it into the SVRG.
We prove its linear convergence theorem that works not only for the proposed method but also for the other existing variants of SVRG with second-order information.
arXiv Detail & Related papers (2022-08-23T16:38:40Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - A Variance Controlled Stochastic Method with Biased Estimation for
Faster Non-convex Optimization [0.0]
We propose a new technique, em variance controlled gradient (VCSG), to improve the performance of the reduced gradient (SVRG)
$lambda$ is introduced in VCSG to avoid over-reducing a variance by SVRG.
$mathcalO(min1/epsilon3/2,n1/4/epsilon)$ number of gradient evaluations, which improves the leading gradient complexity.
arXiv Detail & Related papers (2021-02-19T12:22:56Z) - Hybrid Trilinear and Bilinear Programming for Aligning Partially
Overlapping Point Sets [85.71360365315128]
In many applications, we need algorithms which can align partially overlapping point sets are invariant to the corresponding corresponding RPM algorithm.
We first show that the objective is a cubic bound function. We then utilize the convex envelopes of trilinear and bilinear monomial transformations to derive its lower bound.
We next develop a branch-and-bound (BnB) algorithm which only branches over the transformation variables and runs efficiently.
arXiv Detail & Related papers (2021-01-19T04:24:23Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Variance Reduction with Sparse Gradients [82.41780420431205]
Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients.
We introduce a new sparsity operator: The random-top-k operator.
Our algorithm consistently outperforms SpiderBoost on various tasks including image classification, natural language processing, and sparse matrix factorization.
arXiv Detail & Related papers (2020-01-27T08:23:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.