Tackling Data Heterogeneity: A New Unified Framework for Decentralized
SGD with Sample-induced Topology
- URL: http://arxiv.org/abs/2207.03730v1
- Date: Fri, 8 Jul 2022 07:50:08 GMT
- Title: Tackling Data Heterogeneity: A New Unified Framework for Decentralized
SGD with Sample-induced Topology
- Authors: Yan Huang, Ying Sun, Zehan Zhu, Changzhi Yan, Jinming Xu
- Abstract summary: We develop a general framework unifying several gradient-based optimization methods for empirical risk minimization problems.
We provide a unified perspective for variance-reduction (VR) and gradient-tracking (GT) methods such as SAGA, Local-SVRG and GT-SAGA.
The rate results reveal that VR and GT methods can effectively eliminate data within and across devices, respectively, enabling the exact convergence of the algorithm to the optimal solution.
- Score: 6.6682038218782065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a general framework unifying several gradient-based stochastic
optimization methods for empirical risk minimization problems both in
centralized and distributed scenarios. The framework hinges on the introduction
of an augmented graph consisting of nodes modeling the samples and edges
modeling both the inter-device communication and intra-device stochastic
gradient computation. By designing properly the topology of the augmented
graph, we are able to recover as special cases the renowned Local-SGD and DSGD
algorithms, and provide a unified perspective for variance-reduction (VR) and
gradient-tracking (GT) methods such as SAGA, Local-SVRG and GT-SAGA. We also
provide a unified convergence analysis for smooth and (strongly) convex
objectives relying on a proper structured Lyapunov function, and the obtained
rate can recover the best known results for many existing algorithms. The rate
results further reveal that VR and GT methods can effectively eliminate data
heterogeneity within and across devices, respectively, enabling the exact
convergence of the algorithm to the optimal solution. Numerical experiments
confirm the findings in this paper.
Related papers
- Structure Learning with Adaptive Random Neighborhood Informed MCMC [0.0]
We introduce a novel MCMC sampler, PARNI-DAG, for the problem of structure learning under observational data.
Under the assumption of causal sufficiency, the algorithm allows for approximate sampling directly from the posterior distribution on Directed Acyclic Graphs (DAGs)
We empirically demonstrate its mixing efficiency and accuracy in learning DAG structures on a variety of experiments.
arXiv Detail & Related papers (2023-11-01T15:47:18Z) - Stability and Generalization of the Decentralized Stochastic Gradient
Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm.
Our results analyze the impact of different top factors on the generalization of D-SGDA.
We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z) - Over-the-Air Federated Learning and Optimization [52.5188988624998]
We focus on Federated learning (FL) via edge-the-air computation (AirComp)
We describe the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both convex and non- convex settings.
For different types of local updates that can be transmitted by edge devices (i.e., model, gradient, model difference), we reveal that transmitting in AirFedAvg may cause an aggregation error.
In addition, we consider more practical signal processing schemes to improve the communication efficiency and extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes.
arXiv Detail & Related papers (2023-10-16T05:49:28Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - The Novel Adaptive Fractional Order Gradient Decent Algorithms Design
via Robust Control [5.5491171448159715]
The vanilla fractional order descent may oscillatively converge to a region around the global minimum instead of converging to the exact minimum point, or even diverge, in the case where the objective function is strongly convex.
To address this problem, a novel adaptive fractional order descent (AFDOG) and a novel fractional descent (AFOAGD) method are proposed in this paper.
arXiv Detail & Related papers (2023-03-08T02:03:30Z) - Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent [7.6860514640178]
We propose a novel zeroth-order optimization algorithm for distributed reinforcement learning.
It allows each agent to estimate its local gradient by cost evaluation independently, without use of any consensus protocol.
arXiv Detail & Related papers (2021-07-26T18:11:07Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - SGD with Coordinate Sampling: Theory and Practice [0.0]
MUSKETEER is an adaptive gradient descent algorithm for large scale problems.
It samples the most coordinate coordinates on the noisy gradients and moves along the one direction yielding an important decrease of coordinates.
Numerical experiments on both synthetic and real data examples confirm the effectiveness of MUSKETEER in large scale problems.
arXiv Detail & Related papers (2021-05-25T10:37:50Z) - Decentralized Statistical Inference with Unrolled Graph Neural Networks [26.025935320024665]
We propose a learning-based framework, which unrolls decentralized optimization algorithms into graph neural networks (GNNs)
By minimizing the recovery error via end-to-end training, this learning-based framework resolves the model mismatch issue.
Our convergence analysis reveals that the learned model parameters may accelerate the convergence and reduce the recovery error to a large extent.
arXiv Detail & Related papers (2021-04-04T07:52:34Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.