A Unified Momentum-based Paradigm of Decentralized SGD for Non-Convex
Models and Heterogeneous Data
- URL: http://arxiv.org/abs/2303.00179v1
- Date: Wed, 1 Mar 2023 02:13:22 GMT
- Title: A Unified Momentum-based Paradigm of Decentralized SGD for Non-Convex
Models and Heterogeneous Data
- Authors: Haizhou Du and Chengdong Ni
- Abstract summary: We propose a unified paradigm called U.MP, D-MP and GT-D, which provides a convergence guarantee for non general objectives.
In theory we provide the convergence analysis objectives two approaches for these non-MP algorithms.
- Score: 0.261072980439312
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Emerging distributed applications recently boosted the development of
decentralized machine learning, especially in IoT and edge computing fields. In
real-world scenarios, the common problems of non-convexity and data
heterogeneity result in inefficiency, performance degradation, and development
stagnation. The bulk of studies concentrates on one of the issues mentioned
above without having a more general framework that has been proven optimal. To
this end, we propose a unified paradigm called UMP, which comprises two
algorithms, D-SUM and GT-DSUM, based on the momentum technique with
decentralized stochastic gradient descent(SGD). The former provides a
convergence guarantee for general non-convex objectives. At the same time, the
latter is extended by introducing gradient tracking, which estimates the global
optimization direction to mitigate data heterogeneity(i.e., distribution
drift). We can cover most momentum-based variants based on the classical heavy
ball or Nesterov's acceleration with different parameters in UMP. In theory, we
rigorously provide the convergence analysis of these two approaches for
non-convex objectives and conduct extensive experiments, demonstrating a
significant improvement in model accuracy by up to 57.6% compared to other
methods in practice.
Related papers
- Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata.
DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase.
Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z) - A model-constrained Discontinuous Galerkin Network (DGNet) for Compressible Euler Equations with Out-of-Distribution Generalization [0.0]
We develop a model-constrained discontinuous Galerkin Network (DGNet) approach to solve compressible Euler equations.
To validate the effectiveness, stability, and generalizability of our novel DGNet approach, we present numerical results for 1D and 2D compressible Euler equation problems.
arXiv Detail & Related papers (2024-09-27T01:13:38Z) - Federated Smoothing Proximal Gradient for Quantile Regression with Non-Convex Penalties [3.269165283595478]
Distributed sensors in the internet-of-things (IoT) generate vast amounts of sparse data.
We propose a federated smoothing proximal gradient (G) algorithm that integrates a smoothing mechanism with the view, thereby both precision and computational speed.
arXiv Detail & Related papers (2024-08-10T21:50:19Z) - Decentralized Smoothing ADMM for Quantile Regression with Non-Convex Sparse Penalties [3.269165283595478]
In the rapidly evolving internet-of-things (IoT) ecosystem, effective data analysis techniques are crucial for handling distributed data generated by sensors.
Addressing the limitations of existing methods, such as the sub-gradient consensus approach, which fails to distinguish between active and non-active coefficients.
arXiv Detail & Related papers (2024-08-02T15:00:04Z) - Stability and Generalization of the Decentralized Stochastic Gradient
Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm.
Our results analyze the impact of different top factors on the generalization of D-SGDA.
We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z) - Exact nonlinear state estimation [0.0]
The majority of data assimilation methods in the geosciences are based on Gaussian assumptions.
Non-parametric, particle-based DA algorithms have superior accuracy, but their application to high-dimensional models still poses operational challenges.
This article introduces a new nonlinear estimation theory which attempts to bridge the existing gap in DA methodology.
arXiv Detail & Related papers (2023-10-17T03:44:29Z) - BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery [97.79015388276483]
A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG)
Recent advances enabled effective maximum-likelihood point estimation of DAGs from observational data.
We propose BCD Nets, a variational framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM.
arXiv Detail & Related papers (2021-12-06T03:35:21Z) - Understanding Overparameterization in Generative Adversarial Networks [56.57403335510056]
Generative Adversarial Networks (GANs) are used to train non- concave mini-max optimization problems.
A theory has shown the importance of the gradient descent (GD) to globally optimal solutions.
We show that in an overized GAN with a $1$-layer neural network generator and a linear discriminator, the GDA converges to a global saddle point of the underlying non- concave min-max problem.
arXiv Detail & Related papers (2021-04-12T16:23:37Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - A Unified Theory of Decentralized SGD with Changing Topology and Local
Updates [70.9701218475002]
We introduce a unified convergence analysis of decentralized communication methods.
We derive universal convergence rates for several applications.
Our proofs rely on weak assumptions.
arXiv Detail & Related papers (2020-03-23T17:49:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.