A Stochastic Variance Reduced Gradient using Barzilai-Borwein Techniques
as Second Order Information
- URL: http://arxiv.org/abs/2208.11075v1
- Date: Tue, 23 Aug 2022 16:38:40 GMT
- Title: A Stochastic Variance Reduced Gradient using Barzilai-Borwein Techniques
as Second Order Information
- Authors: Hardik Tankaria and Nobuo Yamashita
- Abstract summary: We consider to improve the gradient variance reduce (SVRG) method via incorporating the curvature information of the objective function.
We propose to reduce the variance of gradients using the computationally efficient Barzilai-Borwein (BB) method by incorporating it into the SVRG.
We prove its linear convergence theorem that works not only for the proposed method but also for the other existing variants of SVRG with second-order information.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we consider to improve the stochastic variance reduce gradient
(SVRG) method via incorporating the curvature information of the objective
function. We propose to reduce the variance of stochastic gradients using the
computationally efficient Barzilai-Borwein (BB) method by incorporating it into
the SVRG. We also incorporate a BB-step size as its variant. We prove its
linear convergence theorem that works not only for the proposed method but also
for the other existing variants of SVRG with second-order information. We
conduct the numerical experiments on the benchmark datasets and show that the
proposed method with constant step size performs better than the existing
variance reduced methods for some test problems.
Related papers
- Total Uncertainty Quantification in Inverse PDE Solutions Obtained with Reduced-Order Deep Learning Surrogate Models [50.90868087591973]
We propose an approximate Bayesian method for quantifying the total uncertainty in inverse PDE solutions obtained with machine learning surrogate models.
We test the proposed framework by comparing it with the iterative ensemble smoother and deep ensembling methods for a non-linear diffusion equation.
arXiv Detail & Related papers (2024-08-20T19:06:02Z) - Adaptive importance sampling for Deep Ritz [7.123920027048777]
We introduce an adaptive sampling method for the Deep Ritz method aimed at solving partial differential equations (PDEs)
One network is employed to approximate the solution of PDEs, while the other one is a deep generative model used to generate new collocation points to refine the training set.
Compared to the original Deep Ritz method, the proposed adaptive method improves accuracy, especially for problems characterized by low regularity and high dimensionality.
arXiv Detail & Related papers (2023-10-26T06:35:08Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - A Variance Controlled Stochastic Method with Biased Estimation for
Faster Non-convex Optimization [0.0]
We propose a new technique, em variance controlled gradient (VCSG), to improve the performance of the reduced gradient (SVRG)
$lambda$ is introduced in VCSG to avoid over-reducing a variance by SVRG.
$mathcalO(min1/epsilon3/2,n1/4/epsilon)$ number of gradient evaluations, which improves the leading gradient complexity.
arXiv Detail & Related papers (2021-02-19T12:22:56Z) - AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods [7.486132958737807]
We present an adaptive variance reduced method with an implicit approach for adaptivity.
We provide convergence guarantees for finite-sum minimization problems and show a faster convergence than SARAH can be achieved if local geometry permits.
This algorithm implicitly computes step-size and efficiently estimates local Lipschitz smoothness of functions.
arXiv Detail & Related papers (2021-02-19T01:17:15Z) - On Stochastic Variance Reduced Gradient Method for Semidefinite
Optimization [14.519696724619074]
The SVRG method has been regarded as one of the most effective methods.
There is a significant gap between the theory and practice when adapted to the semidefinite programming (SDP)
In this paper, we fill this gap via exploiting a new variant of the original SVRG with Option I adapted to the semidefinite optimization.
arXiv Detail & Related papers (2021-01-01T13:55:32Z) - A variable metric mini-batch proximal stochastic recursive gradient
algorithm with diagonal Barzilai-Borwein stepsize [6.358720570480498]
We propose a variable metric mini-batch proximal gradient algorithm VM-mSRGBB, which updates the metric using a new diagonal BB stepsize.
The linear convergence of VM-mSRGBB is established for strongly convex, non-strongly convex and convex functions.
arXiv Detail & Related papers (2020-10-02T07:18:42Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.