Related papers: Decentralized Federated Learning with Gradient Tracking over Time-Varying Directed Networks

Decentralized Federated Learning with Gradient Tracking over Time-Varying Directed Networks

URL: http://arxiv.org/abs/2409.17189v1
Date: Wed, 25 Sep 2024 06:23:16 GMT
Title: Decentralized Federated Learning with Gradient Tracking over Time-Varying Directed Networks
Authors: Duong Thuy Anh Nguyen, Su Wang, Duong Tung Nguyen, Angelia Nedich, H. Vincent Poor,
Abstract summary: We propose a consensus-based algorithm called DSGTm-TV. It incorporates gradient tracking and heavy-ball momentum to optimize a global objective function. Under DSGTm-TV, agents will update local model parameters and gradient estimates using information exchange with neighboring agents.
Score: 42.92231921732718
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We investigate the problem of agent-to-agent interaction in decentralized (federated) learning over time-varying directed graphs, and, in doing so, propose a consensus-based algorithm called DSGTm-TV. The proposed algorithm incorporates gradient tracking and heavy-ball momentum to distributively optimize a global objective function, while preserving local data privacy. Under DSGTm-TV, agents will update local model parameters and gradient estimates using information exchange with neighboring agents enabled through row- and column-stochastic mixing matrices, which we show guarantee both consensus and optimality. Our analysis establishes that DSGTm-TV exhibits linear convergence to the exact global optimum when exact gradient information is available, and converges in expectation to a neighborhood of the global optimum when employing stochastic gradients. Moreover, in contrast to existing methods, DSGTm-TV preserves convergence for networks with uncoordinated stepsizes and momentum parameters, for which we provide explicit bounds. These results enable agents to operate in a fully decentralized manner, independently optimizing their local hyper-parameters. We demonstrate the efficacy of our approach via comparisons with state-of-the-art baselines on real-world image classification and natural language processing tasks.

Related papers

Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture. Non-smooth regularization is often incorporated into machine learning tasks. We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Aiding Global Convergence in Federated Learning via Local Perturbation and Mutual Similarity Information [6.767885381740953]
Federated learning has emerged as a distributed optimization paradigm. We propose a novel modified framework wherein each client locally performs a perturbed gradient step. We show that our algorithm speeds convergence up to a margin of 30 global rounds compared with FedAvg.
arXiv Detail & Related papers (2024-10-07T23:14:05Z)
Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees [18.24213566328972]
Decentralized decentralized learning (DFL) captures FL settings where both (i) model updates and (ii) model aggregations are carried out by the clients without a central server. DSpodFL consistently achieves speeds compared with baselines under various system settings.
arXiv Detail & Related papers (2024-02-05T19:02:19Z)
DRAG: Divergence-based Adaptive Aggregation in Federated learning on Non-IID Data [11.830891255837788]
Local gradient descent (SGD) is a fundamental approach in achieving communication efficiency in Federated Learning (FL) We introduce a novel metric called degree of divergence," quantifying the angle between the local gradient and the global reference direction. We propose the divergence-based adaptive aggregation (DRAG) algorithm, which dynamically drags" the received local updates toward the reference direction in each round without requiring extra communication overhead.
arXiv Detail & Related papers (2023-09-04T19:40:58Z)
FedAgg: Adaptive Federated Learning with Aggregated Gradients [1.5653612447564105]
We propose an adaptive FEDerated learning algorithm called FedAgg to alleviate the divergence between the local and average model parameters and obtain a fast model convergence rate. We show that our framework is superior to existing state-of-the-art FL strategies for enhancing model performance and accelerating convergence rate under IID and Non-IID datasets.
arXiv Detail & Related papers (2023-03-28T08:07:28Z)
Tackling Data Heterogeneity: A New Unified Framework for Decentralized SGD with Sample-induced Topology [6.6682038218782065]
We develop a general framework unifying several gradient-based optimization methods for empirical risk minimization problems. We provide a unified perspective for variance-reduction (VR) and gradient-tracking (GT) methods such as SAGA, Local-SVRG and GT-SAGA. The rate results reveal that VR and GT methods can effectively eliminate data within and across devices, respectively, enabling the exact convergence of the algorithm to the optimal solution.
arXiv Detail & Related papers (2022-07-08T07:50:08Z)
Interpolation-based Correlation Reduction Network for Semi-Supervised Graph Learning [49.94816548023729]
We propose a novel graph contrastive learning method, termed Interpolation-based Correlation Reduction Network (ICRN) In our method, we improve the discriminative capability of the latent feature by enlarging the margin of decision boundaries. By combining the two settings, we extract rich supervision information from both the abundant unlabeled nodes and the rare yet valuable labeled nodes for discnative representation learning.
arXiv Detail & Related papers (2022-06-06T14:26:34Z)
Data-heterogeneity-aware Mixing for Decentralized Learning [63.83913592085953]
We characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes. We propose a metric that quantifies the ability of a graph to mix the current gradients. Motivated by our analysis, we propose an approach that periodically and efficiently optimize the metric.
arXiv Detail & Related papers (2022-04-13T15:54:35Z)
GAN-Supervised Dense Visual Alignment [95.37027391102684]
We propose GAN-Supervised Learning, a framework for learning discriminative models and their GAN-generated training data jointly end-to-end. Inspired by the classic Congealing method, our GANgealing algorithm trains a Spatial Transformer to map random samples from a GAN trained on unaligned data to a common, jointly-learned target mode.
arXiv Detail & Related papers (2021-12-09T18:59:58Z)
Domain Adaptive Person Re-Identification via Coupling Optimization [58.567492812339566]
Domain adaptive person Re-Identification (ReID) is challenging owing to the domain gap and shortage of annotations on target scenarios. This paper proposes a coupling optimization method including the Domain-Invariant Mapping (DIM) method and the Global-Local distance Optimization ( GLO) GLO is designed to train the ReID model with unsupervised setting on the target domain.
arXiv Detail & Related papers (2020-11-06T14:01:03Z)
GTAdam: Gradient Tracking with Adaptive Momentum for Distributed Online Optimization [4.103281325880475]
This paper deals with a network of computing agents aiming to solve an online optimization problem in a distributed fashion, by means of local computation and communication, without any central coordinator. We propose the gradient tracking with adaptive momentum estimation (GTAdam) distributed algorithm, which combines a gradient tracking mechanism with first and second order momentum estimates of the gradient. In these numerical experiments from multi-agent learning, GTAdam outperforms state-of-the-art distributed optimization methods.
arXiv Detail & Related papers (2020-09-03T15:20:21Z)
Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments. We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data. Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.