Distributed Second Order Methods with Fast Rates and Compressed
Communication
- URL: http://arxiv.org/abs/2102.07158v1
- Date: Sun, 14 Feb 2021 14:06:45 GMT
- Title: Distributed Second Order Methods with Fast Rates and Compressed
Communication
- Authors: Rustem Islamov and Xun Qian and Peter Richt\'arik
- Abstract summary: We develop several new communication-efficient second-order methods for distributed optimization.
We prove global sublinear and linear convergence rates, and a fast superlinear rate.
Our results are supported with experimental results on real datasets.
- Score: 6.069611493148631
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop several new communication-efficient second-order methods for
distributed optimization. Our first method, NEWTON-STAR, is a variant of
Newton's method from which it inherits its fast local quadratic rate. However,
unlike Newton's method, NEWTON-STAR enjoys the same per iteration communication
cost as gradient descent. While this method is impractical as it relies on the
use of certain unknown parameters characterizing the Hessian of the objective
function at the optimum, it serves as the starting point which enables us
design practical variants thereof with strong theoretical guarantees. In
particular, we design a stochastic sparsification strategy for learning the
unknown parameters in an iterative fashion in a communication efficient manner.
Applying this strategy to NEWTON-STAR leads to our next method, NEWTON-LEARN,
for which we prove local linear and superlinear rates independent of the
condition number. When applicable, this method can have dramatically superior
convergence behavior when compared to state-of-the-art methods. Finally, we
develop a globalization strategy using cubic regularization which leads to our
next method, CUBIC-NEWTON-LEARN, for which we prove global sublinear and linear
convergence rates, and a fast superlinear rate. Our results are supported with
experimental results on real datasets, and show several orders of magnitude
improvement on baseline and state-of-the-art methods in terms of communication
complexity.
Related papers
- Incremental Gauss--Newton Methods with Superlinear Convergence Rates [16.92437325972209]
We introduce a novel Incremental Gauss--Newton (IGN) method within explicit superlinear convergence rate.
In particular, we formulate our problem by the nonlinear least squares with finite-sum structure.
We also provide a mini-batch extension to our IGN method that obtains an even faster superlinear convergence rate.
arXiv Detail & Related papers (2024-07-03T15:26:34Z) - Incremental Quasi-Newton Methods with Faster Superlinear Convergence
Rates [50.36933471975506]
We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian.
The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate.
This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.
arXiv Detail & Related papers (2024-02-04T05:54:51Z) - Resource-Adaptive Newton's Method for Distributed Learning [16.588456212160928]
This paper introduces a novel and efficient algorithm called RANL, which overcomes the limitations of Newton's method.
Unlike traditional first-order methods, RANL exhibits remarkable independence from the condition number of the problem.
arXiv Detail & Related papers (2023-08-20T04:01:30Z) - FedNew: A Communication-Efficient and Privacy-Preserving Newton-Type
Method for Federated Learning [75.46959684676371]
We introduce a novel framework called FedNew in which there is no need to transmit Hessian information from clients to PS.
FedNew hides the gradient information and results in a privacy-preserving approach compared to the existing state-of-the-art.
arXiv Detail & Related papers (2022-06-17T15:21:39Z) - Basis Matters: Better Communication-Efficient Second Order Methods for
Federated Learning [5.400491728405083]
We show that em Basis Learn (BL) can significantly reduce the communication cost of Newton-type methods.
We present a new Newton-type method (BL1), which reduces communication cost via both em BL technique and bidirectional compression mechanism.
We prove local linear and superlinear rates independent of the condition number.
arXiv Detail & Related papers (2021-11-02T19:09:41Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Acceleration Methods [57.202881673406324]
We first use quadratic optimization problems to introduce two key families of acceleration methods.
We discuss momentum methods in detail, starting with the seminal work of Nesterov.
We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates.
arXiv Detail & Related papers (2021-01-23T17:58:25Z) - Random extrapolation for primal-dual coordinate descent [61.55967255151027]
We introduce a randomly extrapolated primal-dual coordinate descent method that adapts to sparsity of the data matrix and the favorable structures of the objective function.
We show almost sure convergence of the sequence and optimal sublinear convergence rates for the primal-dual gap and objective values, in the general convex-concave case.
arXiv Detail & Related papers (2020-07-13T17:39:35Z) - Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs [71.26657499537366]
We propose a simple literature-based method for the efficient approximation of gradients in neural ODE models.
We compare it with the reverse dynamic method to train neural ODEs on classification, density estimation, and inference approximation tasks.
arXiv Detail & Related papers (2020-03-11T13:15:57Z) - On Newton Screening [14.040371216692645]
We develop a new screening method called Newton screening (NS) which is a generalized Newton method with a built-in screening mechanism.
We show that NS possesses an optimal convergence property in the sense that it achieves one-step local convergence.
arXiv Detail & Related papers (2020-01-27T11:25:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.