Related papers: Decentralized Gradient Tracking with Local Steps

Decentralized Gradient Tracking with Local Steps

URL: http://arxiv.org/abs/2301.01313v1
Date: Tue, 3 Jan 2023 19:06:41 GMT
Title: Decentralized Gradient Tracking with Local Steps
Authors: Yue Liu, Tao Lin, Anastasia Koloskova, Sebastian U. Stich
Abstract summary: Gradient tracking (GT) is an algorithm designed for solving decentralized optimization problems over a network. A key feature of GT is a tracking mechanism that allows to overcome heterogeneity between nodes.
Score: 37.41185751866015
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gradient tracking (GT) is an algorithm designed for solving decentralized optimization problems over a network (such as training a machine learning model). A key feature of GT is a tracking mechanism that allows to overcome data heterogeneity between nodes. We develop a novel decentralized tracking mechanism, $K$-GT, that enables communication-efficient local updates in GT while inheriting the data-independence property of GT. We prove a convergence rate for $K$-GT on smooth non-convex functions and prove that it reduces the communication overhead asymptotically by a linear factor $K$, where $K$ denotes the number of local steps. We illustrate the robustness and effectiveness of this heterogeneity correction on convex and non-convex benchmark problems and on a non-convex neural network training task with the MNIST dataset.

Related papers

Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture. Non-smooth regularization is often incorporated into machine learning tasks. We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
AdaRC: Mitigating Graph Structure Shifts during Test-Time [66.40525136929398]
Test-time adaptation (TTA) has attracted attention due to its ability to adapt a pre-trained model to a target domain without re-accessing the source domain. We propose AdaRC, an innovative framework designed for effective and efficient adaptation to structure shifts in graphs.
arXiv Detail & Related papers (2024-10-09T15:15:40Z)
Cell Graph Transformer for Nuclei Classification [78.47566396839628]
We develop a cell graph transformer (CGT) that treats nodes and edges as input tokens to enable learnable adjacency and information exchange among all nodes. Poorly features can lead to noisy self-attention scores and inferior convergence. We propose a novel topology-aware pretraining method that leverages a graph convolutional network (GCN) to learn a feature extractor.
arXiv Detail & Related papers (2024-02-20T12:01:30Z)
T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining. Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z)
Transitivity-Preserving Graph Representation Learning for Bridging Local Connectivity and Role-based Similarity [2.5252594834159643]
We propose Unified Graph Transformer Networks (UGT) that integrate local and global structural information into fixed-length vector representations. First, UGT learns local structure by identifying the local substructures and aggregating features of the $k$-hop neighborhoods of each node. Third, UGT learns unified representations through self-attention, encoding structural distance and $p$-step transition probability between node pairs.
arXiv Detail & Related papers (2023-08-18T12:49:57Z)
On the Performance of Gradient Tracking with Local Updates [10.14746251086461]
We show that LU-GT has the same communication quality but allows arbitrary network topologies. Numerical examples reveal that local updates can lower communication costs in certain regimes.
arXiv Detail & Related papers (2022-10-10T15:13:23Z)
DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over Graphs [54.08445874064361]
We propose to solve a regularized distributionally robust learning problem in the decentralized setting. By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem. We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
arXiv Detail & Related papers (2022-08-29T18:01:42Z)
ResNorm: Tackling Long-tailed Degree Distribution Issue in Graph Neural Networks via Normalization [80.90206641975375]
This paper focuses on improving the performance of GNNs via normalization. By studying the long-tailed distribution of node degrees in the graph, we propose a novel normalization method for GNNs. The $scale$ operation of ResNorm reshapes the node-wise standard deviation (NStd) distribution so as to improve the accuracy of tail nodes.
arXiv Detail & Related papers (2022-06-16T13:49:09Z)
A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning [1.713291434132985]
We study a large-scale multi-agent minimax optimization problem, which models Geneimation Adversarial Networks (GANs) The overall objective is a sum of agents' private local objective functions. We show that FedGDA-GT converges linearly with a constant stepsize to global $epsilon GDA solution.
arXiv Detail & Related papers (2022-06-02T16:31:16Z)
An Improved Analysis of Gradient Tracking for Decentralized Machine Learning [34.144764431505486]
We consider decentralized machine learning over a network where the training data is distributed across $n$ agents. The agent's common goal is to find a model that minimizes the average of all local loss functions. We improve the dependency on $p$ from $mathcalO(p-1)$ to $mathcalO(p-1)$ in the noiseless case.
arXiv Detail & Related papers (2022-02-08T12:58:14Z)
Node Feature Extraction by Self-Supervised Multi-scale Neighborhood Prediction [123.20238648121445]
We propose a new self-supervised learning framework, Graph Information Aided Node feature exTraction (GIANT) GIANT makes use of the eXtreme Multi-label Classification (XMC) formalism, which is crucial for fine-tuning the language model based on graph information. We demonstrate the superior performance of GIANT over the standard GNN pipeline on Open Graph Benchmark datasets.
arXiv Detail & Related papers (2021-10-29T19:55:12Z)
A fast randomized incremental gradient method for decentralized non-convex optimization [19.540926205375857]
We analyze a single-time randomized method called GT-SAGA GTSAGA for batch non- finite-sum context problems. We show the almost sure convergence of GT-SAGA to a first-order gradient stationary point.
arXiv Detail & Related papers (2020-11-07T21:30:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.