Decentralized Gradient Tracking with Local Steps
- URL: http://arxiv.org/abs/2301.01313v1
- Date: Tue, 3 Jan 2023 19:06:41 GMT
- Title: Decentralized Gradient Tracking with Local Steps
- Authors: Yue Liu, Tao Lin, Anastasia Koloskova, Sebastian U. Stich
- Abstract summary: Gradient tracking (GT) is an algorithm designed for solving decentralized optimization problems over a network.
A key feature of GT is a tracking mechanism that allows to overcome heterogeneity between nodes.
- Score: 37.41185751866015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient tracking (GT) is an algorithm designed for solving decentralized
optimization problems over a network (such as training a machine learning
model). A key feature of GT is a tracking mechanism that allows to overcome
data heterogeneity between nodes.
We develop a novel decentralized tracking mechanism, $K$-GT, that enables
communication-efficient local updates in GT while inheriting the
data-independence property of GT. We prove a convergence rate for $K$-GT on
smooth non-convex functions and prove that it reduces the communication
overhead asymptotically by a linear factor $K$, where $K$ denotes the number of
local steps. We illustrate the robustness and effectiveness of this
heterogeneity correction on convex and non-convex benchmark problems and on a
non-convex neural network training task with the MNIST dataset.
Related papers
- AdaRC: Mitigating Graph Structure Shifts during Test-Time [66.40525136929398]
Test-time adaptation (TTA) has attracted attention due to its ability to adapt a pre-trained model to a target domain without re-accessing the source domain.
We propose AdaRC, an innovative framework designed for effective and efficient adaptation to structure shifts in graphs.
arXiv Detail & Related papers (2024-10-09T15:15:40Z) - Cell Graph Transformer for Nuclei Classification [78.47566396839628]
We develop a cell graph transformer (CGT) that treats nodes and edges as input tokens to enable learnable adjacency and information exchange among all nodes.
Poorly features can lead to noisy self-attention scores and inferior convergence.
We propose a novel topology-aware pretraining method that leverages a graph convolutional network (GCN) to learn a feature extractor.
arXiv Detail & Related papers (2024-02-20T12:01:30Z) - Transitivity-Preserving Graph Representation Learning for Bridging Local
Connectivity and Role-based Similarity [2.5252594834159643]
We propose Unified Graph Transformer Networks (UGT) that integrate local and global structural information into fixed-length vector representations.
First, UGT learns local structure by identifying the local substructures and aggregating features of the $k$-hop neighborhoods of each node.
Third, UGT learns unified representations through self-attention, encoding structural distance and $p$-step transition probability between node pairs.
arXiv Detail & Related papers (2023-08-18T12:49:57Z) - On the Performance of Gradient Tracking with Local Updates [10.14746251086461]
We show that LU-GT has the same communication quality but allows arbitrary network topologies.
Numerical examples reveal that local updates can lower communication costs in certain regimes.
arXiv Detail & Related papers (2022-10-10T15:13:23Z) - DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over
Graphs [54.08445874064361]
We propose to solve a regularized distributionally robust learning problem in the decentralized setting.
By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem.
We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
arXiv Detail & Related papers (2022-08-29T18:01:42Z) - A Communication-efficient Algorithm with Linear Convergence for
Federated Minimax Learning [1.713291434132985]
We study a large-scale multi-agent minimax optimization problem, which models Geneimation Adversarial Networks (GANs)
The overall objective is a sum of agents' private local objective functions.
We show that FedGDA-GT converges linearly with a constant stepsize to global $epsilon GDA solution.
arXiv Detail & Related papers (2022-06-02T16:31:16Z) - An Improved Analysis of Gradient Tracking for Decentralized Machine
Learning [34.144764431505486]
We consider decentralized machine learning over a network where the training data is distributed across $n$ agents.
The agent's common goal is to find a model that minimizes the average of all local loss functions.
We improve the dependency on $p$ from $mathcalO(p-1)$ to $mathcalO(p-1)$ in the noiseless case.
arXiv Detail & Related papers (2022-02-08T12:58:14Z) - Node Feature Extraction by Self-Supervised Multi-scale Neighborhood
Prediction [123.20238648121445]
We propose a new self-supervised learning framework, Graph Information Aided Node feature exTraction (GIANT)
GIANT makes use of the eXtreme Multi-label Classification (XMC) formalism, which is crucial for fine-tuning the language model based on graph information.
We demonstrate the superior performance of GIANT over the standard GNN pipeline on Open Graph Benchmark datasets.
arXiv Detail & Related papers (2021-10-29T19:55:12Z) - A fast randomized incremental gradient method for decentralized
non-convex optimization [19.540926205375857]
We analyze a single-time randomized method called GT-SAGA GTSAGA for batch non- finite-sum context problems.
We show the almost sure convergence of GT-SAGA to a first-order gradient stationary point.
arXiv Detail & Related papers (2020-11-07T21:30:42Z) - Efficient and Stable Graph Scattering Transforms via Pruning [86.76336979318681]
Graph scattering transforms ( GSTs) offer training-free deep GCN models that extract features from graph data.
The price paid by GSTs is exponential complexity in space and time that increases with the number of layers.
The present work addresses the complexity limitation of GSTs by introducing an efficient so-termed pruned (p) GST approach.
arXiv Detail & Related papers (2020-01-27T16:05:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.