On the Parallel Complexity of Multilevel Monte Carlo in Stochastic
Gradient Descent
- URL: http://arxiv.org/abs/2310.02402v2
- Date: Tue, 10 Oct 2023 10:00:53 GMT
- Title: On the Parallel Complexity of Multilevel Monte Carlo in Stochastic
Gradient Descent
- Authors: Kei Ishikawa
- Abstract summary: In neural differential equations, the Multilevel Carlo (MLMC) method is known to offer better theoretical complexity than the naive Monte Carlo approach.
We propose a delayed gradient estimator that drastically reduces the parallel complexity of sequentialC recycling by previously computed components.
The proposed estimator provably reduces the average parallel complexity per gradient at the cost of a slightly worse periteration convergence rate.
- Score: 0.8158530638728501
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the stochastic gradient descent (SGD) for sequential simulations such as
the neural stochastic differential equations, the Multilevel Monte Carlo (MLMC)
method is known to offer better theoretical computational complexity compared
to the naive Monte Carlo approach. However, in practice, MLMC scales poorly on
massively parallel computing platforms such as modern GPUs, because of its
large parallel complexity which is equivalent to that of the naive Monte Carlo
method. To cope with this issue, we propose the delayed MLMC gradient estimator
that drastically reduces the parallel complexity of MLMC by recycling
previously computed gradient components from earlier steps of SGD. The proposed
estimator provably reduces the average parallel complexity per iteration at the
cost of a slightly worse per-iteration convergence rate. In our numerical
experiments, we use an example of deep hedging to demonstrate the superior
parallel complexity of our method compared to the standard MLMC in SGD.
Related papers
- Scalable Bayesian Monte Carlo: fast uncertainty estimation beyond deep ensembles [3.4661537979254655]
This work introduces a new method called scalable Bayesian Monte Carlo (SBMC)<n>The algorithm is a parallel implementation of a consistent (asymptotically unbiased) Bayesian deep learning algorithm: Monte Carlo (SMC) or Markov chain Monte Carlo (MCMC)
arXiv Detail & Related papers (2025-05-19T17:55:32Z) - Convergence Acceleration of Markov Chain Monte Carlo-based Gradient
Descent by Deep Unfolding [5.584060970507506]
This study proposes a trainable sampling-based solver for optimization problems (COPs) using a deep-learning technique called deep unfolding.
The proposed solver is based on the Ohzeki method that combines Markov-chain Monte-Carlo (MCMC) and gradient descent.
The numerical results for a few COPs demonstrated that the proposed solver significantly accelerated the convergence speed compared with the original Ohzeki method.
arXiv Detail & Related papers (2024-02-21T08:21:48Z) - Faster Sampling without Isoperimetry via Diffusion-based Monte Carlo [30.4930148381328]
Diffusion-based Monte Carlo (DMC) is a method to sample from a general target distribution beyond the isoperimetric condition.
DMC encountered high gradient complexity, resulting in an exponential dependency on the error tolerance $epsilon$ of the obtained samples.
We propose RS-DMC, based on a novel recursion-based score estimation method.
Our algorithm is provably much faster than the popular Langevin-based algorithms.
arXiv Detail & Related papers (2024-01-12T02:33:57Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Aiming towards the minimizers: fast convergence of SGD for
overparametrized problems [25.077446336619378]
We propose a regularity regime which endows the gradient method with the same worst-case complexity as the gradient method.
All existing guarantees require the gradient method to take small steps, thereby resulting in a much slower linear rate of convergence.
We demonstrate that our condition holds when training sufficiently wide feedforward neural networks with a linear output layer.
arXiv Detail & Related papers (2023-06-05T05:21:01Z) - Beyond Exponentially Fast Mixing in Average-Reward Reinforcement
Learning via Multi-Level Monte Carlo Actor-Critic [61.968469104271676]
We propose an RL methodology attuned to the mixing time by employing a multi-level Monte Carlo estimator for the critic, the actor, and the average reward embedded within an actor-critic (AC) algorithm.
We experimentally show that these alleviated restrictions on the technical conditions required for stability translate to superior performance in practice for RL problems with sparse rewards.
arXiv Detail & Related papers (2023-01-28T04:12:56Z) - Optimal Algorithms for Stochastic Complementary Composite Minimization [55.26935605535377]
Inspired by regularization techniques in statistics and machine learning, we study complementary composite minimization.
We provide novel excess risk bounds, both in expectation and with high probability.
Our algorithms are nearly optimal, which we prove via novel lower complexity bounds for this class of problems.
arXiv Detail & Related papers (2022-11-03T12:40:24Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - High-Dimensional Sparse Bayesian Learning without Covariance Matrices [66.60078365202867]
We introduce a new inference scheme that avoids explicit construction of the covariance matrix.
Our approach couples a little-known diagonal estimation result from numerical linear algebra with the conjugate gradient algorithm.
On several simulations, our method scales better than existing approaches in computation time and memory.
arXiv Detail & Related papers (2022-02-25T16:35:26Z) - An adaptive Hessian approximated stochastic gradient MCMC method [12.93317525451798]
We present an adaptive Hessian approximated gradient MCMC method to incorporate local geometric information while sampling from the posterior.
We adopt a magnitude-based weight pruning method to enforce the sparsity of the network.
arXiv Detail & Related papers (2020-10-03T16:22:15Z) - Convergence of Meta-Learning with Task-Specific Adaptation over Partial
Parameters [152.03852111442114]
Although model-agnostic metalearning (MAML) is a very successful algorithm meta-learning practice, it can have high computational complexity.
Our paper shows that such complexity can significantly affect the overall convergence performance of ANIL.
arXiv Detail & Related papers (2020-06-16T19:57:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.