Distributed Newton-Type Methods with Communication Compression and
Bernoulli Aggregation
- URL: http://arxiv.org/abs/2206.03588v1
- Date: Tue, 7 Jun 2022 21:12:21 GMT
- Title: Distributed Newton-Type Methods with Communication Compression and
Bernoulli Aggregation
- Authors: Rustem Islamov and Xun Qian and Slavom\'ir Hanzely and Mher Safaryan
and Peter Richt\'arik
- Abstract summary: We study ommunication compression and aggregation mechanisms for curvature information.
New 3PC mechanisms, such as adaptive thresholding and Bernoulli aggregation, require reduced communication and occasional Hessian computations.
For all our methods, we derive fast condition-number-independent local linear and/or superlinear convergence rates.
- Score: 11.870393751095083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite their high computation and communication costs, Newton-type methods
remain an appealing option for distributed training due to their robustness
against ill-conditioned convex problems. In this work, we study ommunication
compression and aggregation mechanisms for curvature information in order to
reduce these costs while preserving theoretically superior local convergence
guarantees. We prove that the recently developed class of three point
compressors (3PC) of Richtarik et al. [2022] for gradient communication can be
generalized to Hessian communication as well. This result opens up a wide
variety of communication strategies, such as contractive compression} and lazy
aggregation, available to our disposal to compress prohibitively costly
curvature information. Moreover, we discovered several new 3PC mechanisms, such
as adaptive thresholding and Bernoulli aggregation, which require reduced
communication and occasional Hessian computations. Furthermore, we extend and
analyze our approach to bidirectional communication compression and partial
device participation setups to cater to the practical considerations of
applications in federated learning. For all our methods, we derive fast
condition-number-independent local linear and/or superlinear convergence rates.
Finally, with extensive numerical evaluations on convex optimization problems,
we illustrate that our designed schemes achieve state-of-the-art communication
complexity compared to several key baselines using second-order information.
Related papers
- Differential error feedback for communication-efficient decentralized learning [48.924131251745266]
We propose a new decentralized communication-efficient learning approach that blends differential quantization with error feedback.
We show that the resulting communication-efficient strategy is stable both in terms of mean-square error and average bit rate.
The results establish that, in the small step-size regime and with a finite number of bits, it is possible to attain the performance achievable in the absence of compression.
arXiv Detail & Related papers (2024-06-26T15:11:26Z) - Improving the Worst-Case Bidirectional Communication Complexity for Nonconvex Distributed Optimization under Function Similarity [92.1840862558718]
We introduce MARINA-P, a novel method for downlink compression, employing a collection of correlated compressors.
We show that MARINA-P with permutation compressors can achieve a server-to-worker communication complexity improving with the number of workers.
We introduce M3, a method combining MARINA-P with uplink compression and a momentum step, achieving bidirectional compression with provable improvements in total communication complexity as the number of workers increases.
arXiv Detail & Related papers (2024-02-09T13:58:33Z) - Lower Bounds and Accelerated Algorithms in Distributed Stochastic
Optimization with Communication Compression [31.107056382542417]
Communication compression is an essential strategy for alleviating communication overhead.
We propose NEOLITHIC, a nearly optimal algorithm for compression under mild conditions.
arXiv Detail & Related papers (2023-05-12T17:02:43Z) - TAMUNA: Doubly Accelerated Distributed Optimization with Local Training, Compression, and Partial Participation [53.84175614198885]
In distributed optimization and learning, several machines alternate between local computations in parallel and communication with a distant server.
We propose TAMUNA, the first algorithm for distributed optimization that leveraged the two strategies of local training and compression jointly and allows for partial participation.
arXiv Detail & Related papers (2023-02-20T08:37:44Z) - 3PC: Three Point Compressors for Communication-Efficient Distributed
Training and a Better Theory for Lazy Aggregation [12.013162443721312]
We propose a new class of gradient communication mechanisms for communication-efficient training.
We show that our approach can recover the recently proposed state-of-the-art error feedback mechanism EF21.
We provide a new fundamental link between the lazy aggregation and error feedback literature.
arXiv Detail & Related papers (2022-02-02T12:34:18Z) - Distributed Methods with Compressed Communication for Solving
Variational Inequalities, with Theoretical Guarantees [115.08148491584997]
We present the first theoretically grounded distributed methods for solving variational inequalities and saddle point problems using compressed communication: MASHA1 and MASHA2.
New algorithms support bidirectional compressions, and also can be modified for setting with batches and for federated learning with partial participation of clients.
arXiv Detail & Related papers (2021-10-07T10:04:32Z) - Permutation Compressors for Provably Faster Distributed Nonconvex
Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity.
Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z) - A Linearly Convergent Algorithm for Decentralized Optimization: Sending
Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators.
We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z) - Federated Learning with Compression: Unified Analysis and Sharp
Guarantees [39.092596142018195]
Communication cost is often a critical bottleneck to scale up distributed optimization algorithms to collaboratively learn a model from millions of devices.
Two notable trends to deal with the communication overhead of federated compression and computation are unreliable compression and heterogeneous communication.
We analyze their convergence in both homogeneous and heterogeneous data distribution settings.
arXiv Detail & Related papers (2020-07-02T14:44:07Z) - Bidirectional compression in heterogeneous settings for distributed or
federated learning with partial participation: tight convergence guarantees [9.31522898261934]
Artemis is a framework to tackle the problem of learning in a distributed setting with communication constraints and device partial participation.
It improves on existing algorithms that only consider unidirectional compression (to the server), or use very strong assumptions on the compression operator, and often do not take into account devices partial participation.
arXiv Detail & Related papers (2020-06-25T17:37:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.