Related papers: Beyond spectral gap: The role of the topology in decentralized learning

Beyond spectral gap: The role of the topology in decentralized learning

URL: http://arxiv.org/abs/2206.03093v1
Date: Tue, 7 Jun 2022 08:19:06 GMT
Title: Beyond spectral gap: The role of the topology in decentralized learning
Authors: Thijs Vogels and Hadrien Hendrikx and Martin Jaggi
Abstract summary: In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model. This paper aims to paint an accurate picture of sparsely-connected distributed optimization when workers share the same data distribution. Our theory matches empirical observations in deep learning, and accurately describes the relative merits of different graph topologies.
Score: 58.48291921602417
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model: more accurate gradients allow them to use larger learning rates and optimize faster. We consider the setting in which all workers sample from the same dataset, and communicate over a sparse graph (decentralized). In this setting, current theory fails to capture important aspects of real-world behavior. First, the 'spectral gap' of the communication graph is not predictive of its empirical performance in (deep) learning. Second, current theory does not explain that collaboration enables larger learning rates than training alone. In fact, it prescribes smaller learning rates, which further decrease as graphs become larger, failing to explain convergence in infinite graphs. This paper aims to paint an accurate picture of sparsely-connected distributed optimization when workers share the same data distribution. We quantify how the graph topology influences convergence in a quadratic toy problem and provide theoretical results for general smooth and (strongly) convex objectives. Our theory matches empirical observations in deep learning, and accurately describes the relative merits of different graph topologies.

Related papers

Learning Latent Graph Structures and their Uncertainty [63.95971478893842]
We show that minimizing point-prediction losses does not guarantee proper learning of latent relational information.<n>We propose a sampling-based method that solves this joint learning task.
arXiv Detail & Related papers (2024-05-30T10:49:22Z)
Understanding Community Bias Amplification in Graph Representation Learning [22.522798932536038]
We study a phenomenon of community bias amplification in graph representation learning. We propose a novel graph contrastive learning model called Random Graph Coarsening Contrastive Learning.
arXiv Detail & Related papers (2023-12-08T07:43:05Z)
Beyond spectral gap (extended): The role of the topology in decentralized learning [58.48291921602417]
In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model. Current theory does not explain that collaboration enables larger learning rates than training alone. This paper aims to paint an accurate picture of sparsely-connected distributed optimization.
arXiv Detail & Related papers (2023-01-05T16:53:38Z)
Optimal Propagation for Graph Neural Networks [51.08426265813481]
We propose a bi-level optimization approach for learning the optimal graph structure. We also explore a low-rank approximation model for further reducing the time complexity.
arXiv Detail & Related papers (2022-05-06T03:37:00Z)
Graph Self-supervised Learning with Accurate Discrepancy Learning [64.69095775258164]
We propose a framework that aims to learn the exact discrepancy between the original and the perturbed graphs, coined as Discrepancy-based Self-supervised LeArning (D-SLA) We validate our method on various graph-related downstream tasks, including molecular property prediction, protein function prediction, and link prediction tasks, on which our model largely outperforms relevant baselines.
arXiv Detail & Related papers (2022-02-07T08:04:59Z)
Deconfounded Training for Graph Neural Networks [98.06386851685645]
We present a new paradigm of decon training (DTP) that better mitigates the confounding effect and latches on the critical information. Specifically, we adopt the attention modules to disentangle the critical subgraph and trivial subgraph. It allows GNNs to capture a more reliable subgraph whose relation with the label is robust across different distributions.
arXiv Detail & Related papers (2021-12-30T15:22:35Z)
Learning Graphs from Smooth Signals under Moment Uncertainty [23.868075779606425]
We consider the problem of inferring the graph structure from a given set of graph signals. Traditional graph learning models do not take this distributional uncertainty into account.
arXiv Detail & Related papers (2021-05-12T06:47:34Z)
Multilayer Clustered Graph Learning [66.94201299553336]
We use contrastive loss as a data fidelity term, in order to properly aggregate the observed layers into a representative graph. Experiments show that our method leads to a clustered clusters w.r.t. We learn a clustering algorithm for solving clustering problems.
arXiv Detail & Related papers (2020-10-29T09:58:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.