Related papers: Basis Matters: Better Communication-Efficient Second Order Methods for Federated Learning

Basis Matters: Better Communication-Efficient Second Order Methods for Federated Learning

URL: http://arxiv.org/abs/2111.01847v1
Date: Tue, 2 Nov 2021 19:09:41 GMT
Title: Basis Matters: Better Communication-Efficient Second Order Methods for Federated Learning
Authors: Xun Qian and Rustem Islamov and Mher Safaryan and Peter Richt\'arik
Abstract summary: We show that em Basis Learn (BL) can significantly reduce the communication cost of Newton-type methods. We present a new Newton-type method (BL1), which reduces communication cost via both em BL technique and bidirectional compression mechanism. We prove local linear and superlinear rates independent of the condition number.
Score: 5.400491728405083
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in distributed optimization have shown that Newton-type methods with proper communication compression mechanisms can guarantee fast local rates and low communication cost compared to first order methods. We discover that the communication cost of these methods can be further reduced, sometimes dramatically so, with a surprisingly simple trick: {\em Basis Learn (BL)}. The idea is to transform the usual representation of the local Hessians via a change of basis in the space of matrices and apply compression tools to the new representation. To demonstrate the potential of using custom bases, we design a new Newton-type method (BL1), which reduces communication cost via both {\em BL} technique and bidirectional compression mechanism. Furthermore, we present two alternative extensions (BL2 and BL3) to partial participation to accommodate federated learning applications. We prove local linear and superlinear rates independent of the condition number. Finally, we support our claims with numerical experiments by comparing several first and second~order~methods.

Related papers

Graph-Sequential Alignment and Uniformity: Toward Enhanced Recommendation Systems [51.716704243764994]
Our framework uses Graph Neural Network (GNN)-based and sequential recommenders as separate submodules while sharing a unified embedding space optimized jointly. Experiments on three real-world datasets demonstrate that the proposed method significantly outperforms using either approach alone.
arXiv Detail & Related papers (2024-12-05T15:59:05Z)
FedNS: A Fast Sketching Newton-Type Algorithm for Federated Learning [27.957498393822338]
We introduce a novel approach to tackle this issue while still achieving fast convergence rates. Our proposed method, named as Federated Newton Sketch methods (FedNS), approximates the centralized Newton's method by communicating the sketched square-root Hessian instead of the exact Hessian. We provide convergence analysis based on statistical learning for the federated Newton sketch approaches.
arXiv Detail & Related papers (2024-01-05T10:06:41Z)
Communication Acceleration of Local Gradient Methods via an Accelerated Primal-Dual Algorithm with Inexact Prox [11.564643329398438]
We propose an alternative algorithm which obtains the same communication acceleration as Mishchenko et al (2022) Our approach is based on the celebrated method of Chambolle and Pock (2011), with several nontrivial modifications. Our method can be applied to optimization over a connected network, and we obtain theoretical improvements here as well.
arXiv Detail & Related papers (2022-07-08T15:24:13Z)
FedNew: A Communication-Efficient and Privacy-Preserving Newton-Type Method for Federated Learning [75.46959684676371]
We introduce a novel framework called FedNew in which there is no need to transmit Hessian information from clients to PS. FedNew hides the gradient information and results in a privacy-preserving approach compared to the existing state-of-the-art.
arXiv Detail & Related papers (2022-06-17T15:21:39Z)
Distributed Newton-Type Methods with Communication Compression and Bernoulli Aggregation [11.870393751095083]
We study ommunication compression and aggregation mechanisms for curvature information. New 3PC mechanisms, such as adaptive thresholding and Bernoulli aggregation, require reduced communication and occasional Hessian computations. For all our methods, we derive fast condition-number-independent local linear and/or superlinear convergence rates.
arXiv Detail & Related papers (2022-06-07T21:12:21Z)
Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation. We develop a new adversarial learning based method, which is simple and efficient to apply. We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z)
Permutation Compressors for Provably Faster Distributed Nonconvex Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity. Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z)
Distributed Second Order Methods with Fast Rates and Compressed Communication [6.069611493148631]
We develop several new communication-efficient second-order methods for distributed optimization. We prove global sublinear and linear convergence rates, and a fast superlinear rate. Our results are supported with experimental results on real datasets.
arXiv Detail & Related papers (2021-02-14T14:06:45Z)
DeepReduce: A Sparse-tensor Communication Framework for Distributed Deep Learning [79.89085533866071]
This paper introduces DeepReduce, a versatile framework for the compressed communication of sparse tensors. DeepReduce decomposes tensors in two sets, values and indices, and allows both independent and combined compression of these sets. Our experiments with large real models demonstrate that DeepReduce transmits fewer data and imposes lower computational overhead than existing methods.
arXiv Detail & Related papers (2021-02-05T11:31:24Z)
A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free! [72.31332210635524]
Decentralized optimization methods enable on-device training of machine learning models without a central coordinator. We propose a new randomized first-order method which tackles the communication bottleneck by applying randomized compression operators. We prove that our method can solve the problems without any increase in the number of communications compared to the baseline.
arXiv Detail & Related papers (2020-11-03T13:35:53Z)
A Generic First-Order Algorithmic Framework for Bi-Level Programming Beyond Lower-Level Singleton [49.23948907229656]
Bi-level Descent Aggregation is a flexible and modularized algorithmic framework for generic bi-level optimization. We derive a new methodology to prove the convergence of BDA without the LLS condition. Our investigations also demonstrate that BDA is indeed compatible to a verify of particular first-order computation modules.
arXiv Detail & Related papers (2020-06-07T05:18:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.