Related papers: Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees

Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees

URL: http://arxiv.org/abs/2110.03313v3
Date: Sun, 2 Apr 2023 11:02:14 GMT
Title: Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees
Authors: Aleksandr Beznosikov and Peter Richt\'arik and Michael Diskin and Max Ryabinin and Alexander Gasnikov
Abstract summary: We present the first theoretically grounded distributed methods for solving variational inequalities and saddle point problems using compressed communication: MASHA1 and MASHA2. New algorithms support bidirectional compressions, and also can be modified for setting with batches and for federated learning with partial participation of clients.
Score: 115.08148491584997
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Variational inequalities in general and saddle point problems in particular are increasingly relevant in machine learning applications, including adversarial learning, GANs, transport and robust optimization. With increasing data and problem sizes necessary to train high performing models across various applications, we need to rely on parallel and distributed computing. However, in distributed training, communication among the compute nodes is a key bottleneck during training, and this problem is exacerbated for high dimensional and over-parameterized models. Due to these considerations, it is important to equip existing methods with strategies that would allow to reduce the volume of transmitted information during training while obtaining a model of comparable quality. In this paper, we present the first theoretically grounded distributed methods for solving variational inequalities and saddle point problems using compressed communication: MASHA1 and MASHA2. Our theory and methods allow for the use of both unbiased (such as Rand$k$; MASHA1) and contractive (such as Top$k$; MASHA2) compressors. New algorithms support bidirectional compressions, and also can be modified for stochastic setting with batches and for federated learning with partial participation of clients. We empirically validated our conclusions using two experimental setups: a standard bilinear min-max problem, and large-scale distributed adversarial training of transformers.

Related papers

SparsyFed: Sparse Adaptive Federated Training [13.001548634185514]
This paper presents SparsyFed, a practical federated sparse training method. We show that SparsyFed simultaneously can produce 95% sparse models, with negligible degradation in accuracy.
arXiv Detail & Related papers (2025-04-07T14:57:02Z)
Accelerated Methods with Compressed Communications for Distributed Optimization Problems under Data Similarity [55.03958223190181]
We propose the first theoretically grounded accelerated algorithms utilizing unbiased and biased compression under data similarity. Our results are of record and confirmed by experiments on different average losses and datasets.
arXiv Detail & Related papers (2024-12-21T00:40:58Z)
Accelerated Stochastic ExtraGradient: Mixing Hessian and Gradient Similarity to Reduce Communication in Distributed and Federated Learning [50.382793324572845]
Distributed computing involves communication between devices, which requires solving two key problems: efficiency and privacy. In this paper, we analyze a new method that incorporates the ideas of using data similarity and clients sampling. To address privacy concerns, we apply the technique of additional noise and analyze its impact on the convergence of the proposed method.
arXiv Detail & Related papers (2024-09-22T00:49:10Z)
Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST) IST is a recently proposed and highly effective technique for solving the aforementioned problems. We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z)
Similarity, Compression and Local Steps: Three Pillars of Efficient Communications for Distributed Variational Inequalities [91.12425544503395]
Variational inequalities are used in various applications ranging from equilibrium search to adversarial learning. Most distributed approaches have a bottleneck - the cost of communications. The three main techniques to reduce the total number of communication rounds and the cost of one such round are the similarity of local functions, compression of transmitted information, and local updates. The methods presented in this paper have the best theoretical guarantees of communication complexity and are significantly ahead of other methods for distributed variational inequalities.
arXiv Detail & Related papers (2023-02-15T12:11:27Z)
Compression and Data Similarity: Combination of Two Techniques for Communication-Efficient Solving of Distributed Variational Inequalities [137.6408511310322]
In this paper we consider a combination of two popular approaches: compression and data similarity. We show that this synergy can be more effective than each of the approaches separately in solving distributed smooth strongly monotonic variational inequalities.
arXiv Detail & Related papers (2022-06-19T16:38:56Z)
A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator. We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators. We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z)
Gradient tracking and variance reduction for decentralized optimization and machine learning [19.54092620537586]
Decentralized methods to solve finite-sum problems are important in many signal processing and machine learning tasks. We provide a unified algorithmic framework that combines variance-reduction with gradient tracking to achieve robust performance.
arXiv Detail & Related papers (2020-02-13T07:17:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.