Distributed Methods with Compressed Communication for Solving
Variational Inequalities, with Theoretical Guarantees
- URL: http://arxiv.org/abs/2110.03313v3
- Date: Sun, 2 Apr 2023 11:02:14 GMT
- Title: Distributed Methods with Compressed Communication for Solving
Variational Inequalities, with Theoretical Guarantees
- Authors: Aleksandr Beznosikov and Peter Richt\'arik and Michael Diskin and Max
Ryabinin and Alexander Gasnikov
- Abstract summary: We present the first theoretically grounded distributed methods for solving variational inequalities and saddle point problems using compressed communication: MASHA1 and MASHA2.
New algorithms support bidirectional compressions, and also can be modified for setting with batches and for federated learning with partial participation of clients.
- Score: 115.08148491584997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Variational inequalities in general and saddle point problems in particular
are increasingly relevant in machine learning applications, including
adversarial learning, GANs, transport and robust optimization. With increasing
data and problem sizes necessary to train high performing models across various
applications, we need to rely on parallel and distributed computing. However,
in distributed training, communication among the compute nodes is a key
bottleneck during training, and this problem is exacerbated for high
dimensional and over-parameterized models. Due to these considerations, it is
important to equip existing methods with strategies that would allow to reduce
the volume of transmitted information during training while obtaining a model
of comparable quality. In this paper, we present the first theoretically
grounded distributed methods for solving variational inequalities and saddle
point problems using compressed communication: MASHA1 and MASHA2. Our theory
and methods allow for the use of both unbiased (such as Rand$k$; MASHA1) and
contractive (such as Top$k$; MASHA2) compressors. New algorithms support
bidirectional compressions, and also can be modified for stochastic setting
with batches and for federated learning with partial participation of clients.
We empirically validated our conclusions using two experimental setups: a
standard bilinear min-max problem, and large-scale distributed adversarial
training of transformers.
Related papers
- Accelerated Stochastic ExtraGradient: Mixing Hessian and Gradient Similarity to Reduce Communication in Distributed and Federated Learning [50.382793324572845]
Distributed computing involves communication between devices, which requires solving two key problems: efficiency and privacy.
In this paper, we analyze a new method that incorporates the ideas of using data similarity and clients sampling.
To address privacy concerns, we apply the technique of additional noise and analyze its impact on the convergence of the proposed method.
arXiv Detail & Related papers (2024-09-22T00:49:10Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Similarity, Compression and Local Steps: Three Pillars of Efficient Communications for Distributed Variational Inequalities [91.12425544503395]
Variational inequalities are used in various applications ranging from equilibrium search to adversarial learning.
Most distributed approaches have a bottleneck - the cost of communications.
The three main techniques to reduce the total number of communication rounds and the cost of one such round are the similarity of local functions, compression of transmitted information, and local updates.
The methods presented in this paper have the best theoretical guarantees of communication complexity and are significantly ahead of other methods for distributed variational inequalities.
arXiv Detail & Related papers (2023-02-15T12:11:27Z) - Compression and Data Similarity: Combination of Two Techniques for
Communication-Efficient Solving of Distributed Variational Inequalities [137.6408511310322]
In this paper we consider a combination of two popular approaches: compression and data similarity.
We show that this synergy can be more effective than each of the approaches separately in solving distributed smooth strongly monotonic variational inequalities.
arXiv Detail & Related papers (2022-06-19T16:38:56Z) - A Linearly Convergent Algorithm for Decentralized Optimization: Sending
Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators.
We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z) - Gradient tracking and variance reduction for decentralized optimization
and machine learning [19.54092620537586]
Decentralized methods to solve finite-sum problems are important in many signal processing and machine learning tasks.
We provide a unified algorithmic framework that combines variance-reduction with gradient tracking to achieve robust performance.
arXiv Detail & Related papers (2020-02-13T07:17:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.