Basis Matters: Better Communication-Efficient Second Order Methods for
Federated Learning
- URL: http://arxiv.org/abs/2111.01847v1
- Date: Tue, 2 Nov 2021 19:09:41 GMT
- Title: Basis Matters: Better Communication-Efficient Second Order Methods for
Federated Learning
- Authors: Xun Qian and Rustem Islamov and Mher Safaryan and Peter Richt\'arik
- Abstract summary: We show that em Basis Learn (BL) can significantly reduce the communication cost of Newton-type methods.
We present a new Newton-type method (BL1), which reduces communication cost via both em BL technique and bidirectional compression mechanism.
We prove local linear and superlinear rates independent of the condition number.
- Score: 5.400491728405083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in distributed optimization have shown that Newton-type
methods with proper communication compression mechanisms can guarantee fast
local rates and low communication cost compared to first order methods. We
discover that the communication cost of these methods can be further reduced,
sometimes dramatically so, with a surprisingly simple trick: {\em Basis Learn
(BL)}. The idea is to transform the usual representation of the local Hessians
via a change of basis in the space of matrices and apply compression tools to
the new representation. To demonstrate the potential of using custom bases, we
design a new Newton-type method (BL1), which reduces communication cost via
both {\em BL} technique and bidirectional compression mechanism. Furthermore,
we present two alternative extensions (BL2 and BL3) to partial participation to
accommodate federated learning applications. We prove local linear and
superlinear rates independent of the condition number. Finally, we support our
claims with numerical experiments by comparing several first and
second~order~methods.
Related papers
- FedNS: A Fast Sketching Newton-Type Algorithm for Federated Learning [27.957498393822338]
We introduce a novel approach to tackle this issue while still achieving fast convergence rates.
Our proposed method, named as Federated Newton Sketch methods (FedNS), approximates the centralized Newton's method by communicating the sketched square-root Hessian instead of the exact Hessian.
We provide convergence analysis based on statistical learning for the federated Newton sketch approaches.
arXiv Detail & Related papers (2024-01-05T10:06:41Z) - Communication Acceleration of Local Gradient Methods via an Accelerated
Primal-Dual Algorithm with Inexact Prox [11.564643329398438]
We propose an alternative algorithm which obtains the same communication acceleration as Mishchenko et al (2022)
Our approach is based on the celebrated method of Chambolle and Pock (2011), with several nontrivial modifications.
Our method can be applied to optimization over a connected network, and we obtain theoretical improvements here as well.
arXiv Detail & Related papers (2022-07-08T15:24:13Z) - FedNew: A Communication-Efficient and Privacy-Preserving Newton-Type
Method for Federated Learning [75.46959684676371]
We introduce a novel framework called FedNew in which there is no need to transmit Hessian information from clients to PS.
FedNew hides the gradient information and results in a privacy-preserving approach compared to the existing state-of-the-art.
arXiv Detail & Related papers (2022-06-17T15:21:39Z) - Distributed Newton-Type Methods with Communication Compression and
Bernoulli Aggregation [11.870393751095083]
We study ommunication compression and aggregation mechanisms for curvature information.
New 3PC mechanisms, such as adaptive thresholding and Bernoulli aggregation, require reduced communication and occasional Hessian computations.
For all our methods, we derive fast condition-number-independent local linear and/or superlinear convergence rates.
arXiv Detail & Related papers (2022-06-07T21:12:21Z) - Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation.
We develop a new adversarial learning based method, which is simple and efficient to apply.
We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z) - Permutation Compressors for Provably Faster Distributed Nonconvex
Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity.
Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z) - Distributed Second Order Methods with Fast Rates and Compressed
Communication [6.069611493148631]
We develop several new communication-efficient second-order methods for distributed optimization.
We prove global sublinear and linear convergence rates, and a fast superlinear rate.
Our results are supported with experimental results on real datasets.
arXiv Detail & Related papers (2021-02-14T14:06:45Z) - DeepReduce: A Sparse-tensor Communication Framework for Distributed Deep
Learning [79.89085533866071]
This paper introduces DeepReduce, a versatile framework for the compressed communication of sparse tensors.
DeepReduce decomposes tensors in two sets, values and indices, and allows both independent and combined compression of these sets.
Our experiments with large real models demonstrate that DeepReduce transmits fewer data and imposes lower computational overhead than existing methods.
arXiv Detail & Related papers (2021-02-05T11:31:24Z) - A Linearly Convergent Algorithm for Decentralized Optimization: Sending
Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator.
We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators.
We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z) - A Generic First-Order Algorithmic Framework for Bi-Level Programming
Beyond Lower-Level Singleton [49.23948907229656]
Bi-level Descent Aggregation is a flexible and modularized algorithmic framework for generic bi-level optimization.
We derive a new methodology to prove the convergence of BDA without the LLS condition.
Our investigations also demonstrate that BDA is indeed compatible to a verify of particular first-order computation modules.
arXiv Detail & Related papers (2020-06-07T05:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.