Distributed Second Order Methods with Fast Rates and Compressed
Communication
- URL: http://arxiv.org/abs/2102.07158v1
- Date: Sun, 14 Feb 2021 14:06:45 GMT
- Title: Distributed Second Order Methods with Fast Rates and Compressed
Communication
- Authors: Rustem Islamov and Xun Qian and Peter Richt\'arik
- Abstract summary: We develop several new communication-efficient second-order methods for distributed optimization.
We prove global sublinear and linear convergence rates, and a fast superlinear rate.
Our results are supported with experimental results on real datasets.
- Score: 6.069611493148631
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop several new communication-efficient second-order methods for
distributed optimization. Our first method, NEWTON-STAR, is a variant of
Newton's method from which it inherits its fast local quadratic rate. However,
unlike Newton's method, NEWTON-STAR enjoys the same per iteration communication
cost as gradient descent. While this method is impractical as it relies on the
use of certain unknown parameters characterizing the Hessian of the objective
function at the optimum, it serves as the starting point which enables us
design practical variants thereof with strong theoretical guarantees. In
particular, we design a stochastic sparsification strategy for learning the
unknown parameters in an iterative fashion in a communication efficient manner.
Applying this strategy to NEWTON-STAR leads to our next method, NEWTON-LEARN,
for which we prove local linear and superlinear rates independent of the
condition number. When applicable, this method can have dramatically superior
convergence behavior when compared to state-of-the-art methods. Finally, we
develop a globalization strategy using cubic regularization which leads to our
next method, CUBIC-NEWTON-LEARN, for which we prove global sublinear and linear
convergence rates, and a fast superlinear rate. Our results are supported with
experimental results on real datasets, and show several orders of magnitude
improvement on baseline and state-of-the-art methods in terms of communication
complexity.
Related papers
- Symmetric Rank-One Quasi-Newton Methods for Deep Learning Using Cubic Regularization [0.5120567378386615]
First-order descent and other first-order variants, such as Adam and AdaGrad, are commonly used in the field of deep learning.
However, these methods do not exploit curvature information.
Quasi-Newton methods re-use previously computed low Hessian approximations.
arXiv Detail & Related papers (2025-02-17T20:20:11Z) - Incremental Gauss--Newton Methods with Superlinear Convergence Rates [16.92437325972209]
We introduce a novel Incremental Gauss--Newton (IGN) method within explicit superlinear convergence rate.
In particular, we formulate our problem by the nonlinear least squares with finite-sum structure.
We also provide a mini-batch extension to our IGN method that obtains an even faster superlinear convergence rate.
arXiv Detail & Related papers (2024-07-03T15:26:34Z) - Incremental Quasi-Newton Methods with Faster Superlinear Convergence
Rates [50.36933471975506]
We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian.
The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate.
This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.
arXiv Detail & Related papers (2024-02-04T05:54:51Z) - Newton-CG methods for nonconvex unconstrained optimization with Hölder continuous Hessian [53.044526424637866]
We consider a nonconstrained unconstrained optimization problem minimizing a twice differentiable objective function with H"older Hessian.
We first propose a Newton-conjugate (Newton-CG) method for finding an approximate first- and second-order stationary point.
We present preliminary numerical results to demonstrate the superior practical performance of our parameter-free NewtonCG method.
arXiv Detail & Related papers (2023-11-22T01:50:43Z) - Adaptive pruning-based Newton's method for distributed learning [14.885388389215587]
This paper presents a novel and efficient algorithm named Distributed Adaptive Newton Learning (textttDANL)
textttDANL attains a linear convergence rate while efficiently adapting to available resources and keeping high efficiency.
Experiments demonstrate that textttDANL achieves linear convergence with efficient communication and strong performance across different datasets.
arXiv Detail & Related papers (2023-08-20T04:01:30Z) - FedNew: A Communication-Efficient and Privacy-Preserving Newton-Type
Method for Federated Learning [75.46959684676371]
We introduce a novel framework called FedNew in which there is no need to transmit Hessian information from clients to PS.
FedNew hides the gradient information and results in a privacy-preserving approach compared to the existing state-of-the-art.
arXiv Detail & Related papers (2022-06-17T15:21:39Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Random extrapolation for primal-dual coordinate descent [61.55967255151027]
We introduce a randomly extrapolated primal-dual coordinate descent method that adapts to sparsity of the data matrix and the favorable structures of the objective function.
We show almost sure convergence of the sequence and optimal sublinear convergence rates for the primal-dual gap and objective values, in the general convex-concave case.
arXiv Detail & Related papers (2020-07-13T17:39:35Z) - Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs [71.26657499537366]
We propose a simple literature-based method for the efficient approximation of gradients in neural ODE models.
We compare it with the reverse dynamic method to train neural ODEs on classification, density estimation, and inference approximation tasks.
arXiv Detail & Related papers (2020-03-11T13:15:57Z) - On Newton Screening [14.040371216692645]
We develop a new screening method called Newton screening (NS) which is a generalized Newton method with a built-in screening mechanism.
We show that NS possesses an optimal convergence property in the sense that it achieves one-step local convergence.
arXiv Detail & Related papers (2020-01-27T11:25:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.