A Unified Momentum-based Paradigm of Decentralized SGD for Non-Convex
Models and Heterogeneous Data
- URL: http://arxiv.org/abs/2303.00179v1
- Date: Wed, 1 Mar 2023 02:13:22 GMT
- Title: A Unified Momentum-based Paradigm of Decentralized SGD for Non-Convex
Models and Heterogeneous Data
- Authors: Haizhou Du and Chengdong Ni
- Abstract summary: We propose a unified paradigm called U.MP, D-MP and GT-D, which provides a convergence guarantee for non general objectives.
In theory we provide the convergence analysis objectives two approaches for these non-MP algorithms.
- Score: 0.261072980439312
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emerging distributed applications recently boosted the development of
decentralized machine learning, especially in IoT and edge computing fields. In
real-world scenarios, the common problems of non-convexity and data
heterogeneity result in inefficiency, performance degradation, and development
stagnation. The bulk of studies concentrates on one of the issues mentioned
above without having a more general framework that has been proven optimal. To
this end, we propose a unified paradigm called UMP, which comprises two
algorithms, D-SUM and GT-DSUM, based on the momentum technique with
decentralized stochastic gradient descent(SGD). The former provides a
convergence guarantee for general non-convex objectives. At the same time, the
latter is extended by introducing gradient tracking, which estimates the global
optimization direction to mitigate data heterogeneity(i.e., distribution
drift). We can cover most momentum-based variants based on the classical heavy
ball or Nesterov's acceleration with different parameters in UMP. In theory, we
rigorously provide the convergence analysis of these two approaches for
non-convex objectives and conduct extensive experiments, demonstrating a
significant improvement in model accuracy by up to 57.6% compared to other
methods in practice.
Related papers
- Preconditioned Inexact Stochastic ADMM for Deep Model [35.37705488695026]
This paper develops an algorithm, PISA, which enables scalable parallel computing and supports various second-moment schemes.
Grounded in rigorous theoretical guarantees, the algorithm converges under the sole assumption of Lipschitz of the gradient.
Comprehensive experimental evaluations for or fine-tuning diverse FMs, including vision models, large language models, reinforcement learning models, generative adversarial networks, and recurrent neural networks, demonstrate its superior numerical performance compared to various state-of-the-art Directions.
arXiv Detail & Related papers (2025-02-15T12:28:51Z) - A Bias-Correction Decentralized Stochastic Gradient Algorithm with Momentum Acceleration [19.83835152405735]
We propose a momentum-celerated distributed gradient, termed Exact-Diffusion with Momentum (EDM)
EDM mitigates the bias from data heterogeneity and incorporates momentum techniques commonly used in deep learning.
Our theoretical analysis demonstrates that the EDM algorithm converges sublinearly to the neighborhood optimal solution.
arXiv Detail & Related papers (2025-01-31T12:15:58Z) - Generalized EXTRA stochastic gradient Langevin dynamics [11.382163777108385]
Langevin algorithms are popular Markov Chain Monte Carlo methods for Bayesian learning.
Their versions such as Langevin dynamics (SGLD) allow iterative learning based on randomly sampled mini-batches.
When data is decentralized across a network of agents subject to communication and privacy constraints, standard SGLD algorithms cannot be applied.
arXiv Detail & Related papers (2024-12-02T21:57:30Z) - Stability and Generalization for Distributed SGDA [70.97400503482353]
We propose the stability-based generalization analytical framework for Distributed-SGDA.
We conduct a comprehensive analysis of stability error, generalization gap, and population risk across different metrics.
Our theoretical results reveal the trade-off between the generalization gap and optimization error.
arXiv Detail & Related papers (2024-11-14T11:16:32Z) - Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata.
DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase.
Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z) - Decentralized Smoothing ADMM for Quantile Regression with Non-Convex Sparse Penalties [3.269165283595478]
In the rapidly evolving internet-of-things (IoT) ecosystem, effective data analysis techniques are crucial for handling distributed data generated by sensors.
Addressing the limitations of existing methods, such as the sub-gradient consensus approach, which fails to distinguish between active and non-active coefficients.
arXiv Detail & Related papers (2024-08-02T15:00:04Z) - Stability and Generalization of the Decentralized Stochastic Gradient
Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm.
Our results analyze the impact of different top factors on the generalization of D-SGDA.
We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z) - Understanding Overparameterization in Generative Adversarial Networks [56.57403335510056]
Generative Adversarial Networks (GANs) are used to train non- concave mini-max optimization problems.
A theory has shown the importance of the gradient descent (GD) to globally optimal solutions.
We show that in an overized GAN with a $1$-layer neural network generator and a linear discriminator, the GDA converges to a global saddle point of the underlying non- concave min-max problem.
arXiv Detail & Related papers (2021-04-12T16:23:37Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - A Unified Theory of Decentralized SGD with Changing Topology and Local
Updates [70.9701218475002]
We introduce a unified convergence analysis of decentralized communication methods.
We derive universal convergence rates for several applications.
Our proofs rely on weak assumptions.
arXiv Detail & Related papers (2020-03-23T17:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.