Related papers: Distributed Second Order Methods with Fast Rates and Compressed Communication

Distributed Second Order Methods with Fast Rates and Compressed Communication

URL: http://arxiv.org/abs/2102.07158v1
Date: Sun, 14 Feb 2021 14:06:45 GMT
Title: Distributed Second Order Methods with Fast Rates and Compressed Communication
Authors: Rustem Islamov and Xun Qian and Peter Richt\'arik
Abstract summary: We develop several new communication-efficient second-order methods for distributed optimization. We prove global sublinear and linear convergence rates, and a fast superlinear rate. Our results are supported with experimental results on real datasets.
Score: 6.069611493148631
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We develop several new communication-efficient second-order methods for distributed optimization. Our first method, NEWTON-STAR, is a variant of Newton's method from which it inherits its fast local quadratic rate. However, unlike Newton's method, NEWTON-STAR enjoys the same per iteration communication cost as gradient descent. While this method is impractical as it relies on the use of certain unknown parameters characterizing the Hessian of the objective function at the optimum, it serves as the starting point which enables us design practical variants thereof with strong theoretical guarantees. In particular, we design a stochastic sparsification strategy for learning the unknown parameters in an iterative fashion in a communication efficient manner. Applying this strategy to NEWTON-STAR leads to our next method, NEWTON-LEARN, for which we prove local linear and superlinear rates independent of the condition number. When applicable, this method can have dramatically superior convergence behavior when compared to state-of-the-art methods. Finally, we develop a globalization strategy using cubic regularization which leads to our next method, CUBIC-NEWTON-LEARN, for which we prove global sublinear and linear convergence rates, and a fast superlinear rate. Our results are supported with experimental results on real datasets, and show several orders of magnitude improvement on baseline and state-of-the-art methods in terms of communication complexity.

Related papers

Symmetric Rank-One Quasi-Newton Methods for Deep Learning Using Cubic Regularization [0.5120567378386615]
First-order descent and other first-order variants, such as Adam and AdaGrad, are commonly used in the field of deep learning. However, these methods do not exploit curvature information. Quasi-Newton methods re-use previously computed low Hessian approximations.
arXiv Detail & Related papers (2025-02-17T20:20:11Z)
Incremental Gauss--Newton Methods with Superlinear Convergence Rates [16.92437325972209]
We introduce a novel Incremental Gauss--Newton (IGN) method within explicit superlinear convergence rate. In particular, we formulate our problem by the nonlinear least squares with finite-sum structure. We also provide a mini-batch extension to our IGN method that obtains an even faster superlinear convergence rate.
arXiv Detail & Related papers (2024-07-03T15:26:34Z)
Incremental Quasi-Newton Methods with Faster Superlinear Convergence Rates [50.36933471975506]
We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian. The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate. This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.
arXiv Detail & Related papers (2024-02-04T05:54:51Z)
Newton-CG methods for nonconvex unconstrained optimization with Hölder continuous Hessian [53.044526424637866]
We consider a nonconstrained unconstrained optimization problem minimizing a twice differentiable objective function with H"older Hessian. We first propose a Newton-conjugate (Newton-CG) method for finding an approximate first- and second-order stationary point. We present preliminary numerical results to demonstrate the superior practical performance of our parameter-free NewtonCG method.
arXiv Detail & Related papers (2023-11-22T01:50:43Z)
Resource-Adaptive Newton's Method for Distributed Learning [16.588456212160928]
This paper introduces a novel and efficient algorithm called RANL, which overcomes the limitations of Newton's method. Unlike traditional first-order methods, RANL exhibits remarkable independence from the condition number of the problem.
arXiv Detail & Related papers (2023-08-20T04:01:30Z)
FedNew: A Communication-Efficient and Privacy-Preserving Newton-Type Method for Federated Learning [75.46959684676371]
We introduce a novel framework called FedNew in which there is no need to transmit Hessian information from clients to PS. FedNew hides the gradient information and results in a privacy-preserving approach compared to the existing state-of-the-art.
arXiv Detail & Related papers (2022-06-17T15:21:39Z)
Basis Matters: Better Communication-Efficient Second Order Methods for Federated Learning [5.400491728405083]
We show that em Basis Learn (BL) can significantly reduce the communication cost of Newton-type methods. We present a new Newton-type method (BL1), which reduces communication cost via both em BL technique and bidirectional compression mechanism. We prove local linear and superlinear rates independent of the condition number.
arXiv Detail & Related papers (2021-11-02T19:09:41Z)
A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states. We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z)
Acceleration Methods [57.202881673406324]
We first use quadratic optimization problems to introduce two key families of acceleration methods. We discuss momentum methods in detail, starting with the seminal work of Nesterov. We conclude by discussing restart schemes, a set of simple techniques for reaching nearly optimal convergence rates.
arXiv Detail & Related papers (2021-01-23T17:58:25Z)
Random extrapolation for primal-dual coordinate descent [61.55967255151027]
We introduce a randomly extrapolated primal-dual coordinate descent method that adapts to sparsity of the data matrix and the favorable structures of the objective function. We show almost sure convergence of the sequence and optimal sublinear convergence rates for the primal-dual gap and objective values, in the general convex-concave case.
arXiv Detail & Related papers (2020-07-13T17:39:35Z)
Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs [71.26657499537366]
We propose a simple literature-based method for the efficient approximation of gradients in neural ODE models. We compare it with the reverse dynamic method to train neural ODEs on classification, density estimation, and inference approximation tasks.
arXiv Detail & Related papers (2020-03-11T13:15:57Z)
On Newton Screening [14.040371216692645]
We develop a new screening method called Newton screening (NS) which is a generalized Newton method with a built-in screening mechanism. We show that NS possesses an optimal convergence property in the sense that it achieves one-step local convergence.
arXiv Detail & Related papers (2020-01-27T11:25:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.