Multi-index Antithetic Stochastic Gradient Algorithm
- URL: http://arxiv.org/abs/2006.06102v2
- Date: Thu, 30 Sep 2021 13:10:11 GMT
- Title: Multi-index Antithetic Stochastic Gradient Algorithm
- Authors: Mateusz B. Majka, Marc Sabate-Vidales, {\L}ukasz Szpruch
- Abstract summary: Gradient algorithms (SGAs) are ubiquitous in computational statistics, machine learning and optimisation.
Recent years have brought an influx of interest in SGAs, and the non-asymptotic analysis of their bias is now well-developed.
We show that MASGA is an optimal estimator from the mean square error-computational cost perspective within the class of Monte Carlo estimators.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic Gradient Algorithms (SGAs) are ubiquitous in computational
statistics, machine learning and optimisation. Recent years have brought an
influx of interest in SGAs, and the non-asymptotic analysis of their bias is by
now well-developed. However, relatively little is known about the optimal
choice of the random approximation (e.g mini-batching) of the gradient in SGAs
as this relies on the analysis of the variance and is problem specific. While
there have been numerous attempts to reduce the variance of SGAs, these
typically exploit a particular structure of the sampled distribution by
requiring a priori knowledge of its density's mode. It is thus unclear how to
adapt such algorithms to non-log-concave settings. In this paper, we construct
a Multi-index Antithetic Stochastic Gradient Algorithm (MASGA) whose
implementation is independent of the structure of the target measure and which
achieves performance on par with Monte Carlo estimators that have access to
unbiased samples from the distribution of interest. In other words, MASGA is an
optimal estimator from the mean square error-computational cost perspective
within the class of Monte Carlo estimators. We prove this fact rigorously for
log-concave settings and verify it numerically for some examples where the
log-concavity assumption is not satisfied.
Related papers
- Random Scaling and Momentum for Non-smooth Non-convex Optimization [38.443430569753026]
Training neural networks requires a loss function that may be highly irregular, and in particular neither convex nor smooth.
Popular training algorithms are based on gradient descent with momentum (SGDM), for which analysis applies only if the loss is either convex or smooth.
arXiv Detail & Related papers (2024-05-16T00:52:03Z) - A general error analysis for randomized low-rank approximation with application to data assimilation [42.57210316104905]
We propose a framework for the analysis of the low-rank approximation error in Frobenius norm for centered and non-standard matrices.
Under minimal assumptions, we derive accurate bounds in expectation and probability.
Our bounds have clear interpretations that enable us to derive properties and motivate practical choices.
arXiv Detail & Related papers (2024-05-08T04:51:56Z) - Tuning Stochastic Gradient Algorithms for Statistical Inference via
Large-Sample Asymptotics [18.93569692490218]
tuning of gradient algorithms often based on trial-and-error rather than generalizable theory.
We show that averaging with a large fixed step size is robust to the choice of tuning parameters.
We lay the foundation for a systematic analysis of other gradient Monte Carlo algorithms.
arXiv Detail & Related papers (2022-07-25T17:58:09Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Stochastic optimization with momentum: convergence, fluctuations, and
traps avoidance [0.0]
In this paper, a general optimization procedure is studied, unifying several variants of the gradient descent such as, among others, the heavy ball method, the Nesterov Accelerated Gradient (S-NAG), and the widely used Adam method.
The avoidance is studied as a noisy discretization of a non-autonomous ordinary differential equation.
arXiv Detail & Related papers (2020-12-07T19:14:49Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Stochastic Proximal Gradient Algorithm with Minibatches. Application to
Large Scale Learning Models [2.384873896423002]
We develop and analyze minibatch variants of gradient algorithm for general composite objective functions with nonsmooth components.
We provide complexity for constant and variable stepsize iteration policies obtaining that, for minibatch size $N$, after $mathcalO(frac1Nepsilon)$ $epsilon-$subity is attained in expected quadratic distance to optimal solution.
arXiv Detail & Related papers (2020-03-30T10:43:56Z) - Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model.
We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.