Resource-Adaptive Newton's Method for Distributed Learning
- URL: http://arxiv.org/abs/2308.10154v3
- Date: Sat, 2 Sep 2023 09:34:07 GMT
- Title: Resource-Adaptive Newton's Method for Distributed Learning
- Authors: Shuzhen Chen, Yuan Yuan, Youming Tao, Zhipeng Cai and Dongxiao Yu
- Abstract summary: This paper introduces a novel and efficient algorithm called RANL, which overcomes the limitations of Newton's method.
Unlike traditional first-order methods, RANL exhibits remarkable independence from the condition number of the problem.
- Score: 16.588456212160928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distributed stochastic optimization methods based on Newton's method offer
significant advantages over first-order methods by leveraging curvature
information for improved performance. However, the practical applicability of
Newton's method is hindered in large-scale and heterogeneous learning
environments due to challenges such as high computation and communication costs
associated with the Hessian matrix, sub-model diversity, staleness in training,
and data heterogeneity. To address these challenges, this paper introduces a
novel and efficient algorithm called RANL, which overcomes the limitations of
Newton's method by employing a simple Hessian initialization and adaptive
assignments of training regions. The algorithm demonstrates impressive
convergence properties, which are rigorously analyzed under standard
assumptions in stochastic optimization. The theoretical analysis establishes
that RANL achieves a linear convergence rate while effectively adapting to
available resources and maintaining high efficiency. Unlike traditional
first-order methods, RANL exhibits remarkable independence from the condition
number of the problem and eliminates the need for complex parameter tuning.
These advantages make RANL a promising approach for distributed stochastic
optimization in practical scenarios.
Related papers
- Learning a Class of Mixed Linear Regressions: Global Convergence under General Data Conditions [1.9295130374196499]
Mixed linear regression (MLR) has attracted increasing attention because of its great theoretical and practical importance in nonlinear relationships by utilizing a mixture of linear regression sub-models.
Although considerable efforts have been devoted to the learning problem of such systems, most existing investigations impose the strict independent and identically distributed (i.i.d.) or distributed PE conditions.
arXiv Detail & Related papers (2025-03-24T09:57:39Z) - Representation and Regression Problems in Neural Networks: Relaxation, Generalization, and Numerics [5.915970073098098]
We address three non-dimensional optimization problems associated with training shallow neural networks (NNs)
We convexify these problems and representation, applying a representer gradient to prove the absence relaxation gaps.
We analyze the impact of key parameters on these bounds, propose optimal choices.
For high-dimensional datasets, we propose a sparsification algorithm that, combined with gradient descent, yields effective solutions.
arXiv Detail & Related papers (2024-12-02T15:40:29Z) - Optimization by Parallel Quasi-Quantum Annealing with Gradient-Based Sampling [0.0]
This study proposes a different approach that integrates gradient-based update through continuous relaxation, combined with Quasi-Quantum Annealing (QQA)
Numerical experiments demonstrate that our method is a competitive general-purpose solver, achieving performance comparable to iSCO and learning-based solvers.
arXiv Detail & Related papers (2024-09-02T12:55:27Z) - The Stochastic Conjugate Subgradient Algorithm For Kernel Support Vector Machines [1.738375118265695]
This paper proposes an innovative method specifically designed for kernel support vector machines (SVMs)
It not only achieves faster iteration per iteration but also exhibits enhanced convergence when compared to conventional SFO techniques.
Our experimental results demonstrate that the proposed algorithm not only maintains but potentially exceeds the scalability of SFO methods.
arXiv Detail & Related papers (2024-07-30T17:03:19Z) - Adaptive debiased SGD in high-dimensional GLMs with streaming data [4.704144189806667]
We introduce a novel approach to online inference in high-dimensional generalized linear models.
Our method operates in a single-pass mode, significantly reducing both time and space complexity.
We demonstrate that our method, termed the Approximated Debiased Lasso (ADL), not only mitigates the need for the bounded individual probability condition but also significantly improves numerical performance.
arXiv Detail & Related papers (2024-05-28T15:36:48Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data.
In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - On Hypothesis Transfer Learning of Functional Linear Models [8.557392136621894]
We study the transfer learning (TL) for the functional linear regression (FLR) under the Reproducing Kernel Space (RKHS) framework.
We measure the similarity across tasks using RKHS distance, allowing the type of information being transferred tied to the properties of the imposed RKHS.
Two algorithms are proposed: one conducts the transfer when positive sources are known, while the other leverages aggregation to achieve robust transfer without prior information about the sources.
arXiv Detail & Related papers (2022-06-09T04:50:16Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - Distributed Second Order Methods with Fast Rates and Compressed
Communication [6.069611493148631]
We develop several new communication-efficient second-order methods for distributed optimization.
We prove global sublinear and linear convergence rates, and a fast superlinear rate.
Our results are supported with experimental results on real datasets.
arXiv Detail & Related papers (2021-02-14T14:06:45Z) - Deep Magnification-Flexible Upsampling over 3D Point Clouds [103.09504572409449]
We propose a novel end-to-end learning-based framework to generate dense point clouds.
We first formulate the problem explicitly, which boils down to determining the weights and high-order approximation errors.
Then, we design a lightweight neural network to adaptively learn unified and sorted weights as well as the high-order refinements.
arXiv Detail & Related papers (2020-11-25T14:00:18Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - The Strength of Nesterov's Extrapolation in the Individual Convergence
of Nonsmooth Optimization [0.0]
We prove that Nesterov's extrapolation has the strength to make the individual convergence of gradient descent methods optimal for nonsmooth problems.
We give an extension of the derived algorithms to solve regularized learning tasks with nonsmooth losses in settings.
Our method is applicable as an efficient tool for solving large-scale $l$1-regularized hinge-loss learning problems.
arXiv Detail & Related papers (2020-06-08T03:35:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.