Avoiding Communication in Logistic Regression
- URL: http://arxiv.org/abs/2011.08281v1
- Date: Mon, 16 Nov 2020 21:14:39 GMT
- Title: Avoiding Communication in Logistic Regression
- Authors: Aditya Devarakonda, James Demmel
- Abstract summary: gradient descent (SGD) is one of the most widely used optimization methods for solving various machine learning problems.
In a parallel setting, SGD requires interprocess communication at every iteration.
We introduce a new communication-avoiding technique for solving the logistic regression problem using SGD.
- Score: 1.7780157772002312
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stochastic gradient descent (SGD) is one of the most widely used optimization
methods for solving various machine learning problems. SGD solves an
optimization problem by iteratively sampling a few data points from the input
data, computing gradients for the selected data points, and updating the
solution. However, in a parallel setting, SGD requires interprocess
communication at every iteration. We introduce a new communication-avoiding
technique for solving the logistic regression problem using SGD. This technique
re-organizes the SGD computations into a form that communicates every $s$
iterations instead of every iteration, where $s$ is a tuning parameter. We
prove theoretical flops, bandwidth, and latency upper bounds for SGD and its
new communication-avoiding variant. Furthermore, we show experimental results
that illustrate that the new Communication-Avoiding SGD (CA-SGD) method can
achieve speedups of up to $4.97\times$ on a high-performance Infiniband cluster
without altering the convergence behavior or accuracy.
Related papers
- Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
We present a $textbfMA-OSMA$ algorithm to transfer the discrete submodular problem into a continuous optimization.
We also introduce a projection-free $textbfMA-OSEA$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution.
Our algorithms significantly improve the $(frac11+c)$-approximation provided by the state-of-the-art OSG algorithm.
arXiv Detail & Related papers (2025-02-07T15:57:56Z) - Communication-Efficient, 2D Parallel Stochastic Gradient Descent for Distributed-Memory Optimization [2.2596489829928452]
This work generalizes work on 1D $s$-step SGD and 1D Federated SGD with Averaging (FedAvg) to yield a 2D parallel SGD method (HybridSGD)
We implement all algorithms in C++ and MPI and evaluate their performance on a Cray EX supercomputing system.
arXiv Detail & Related papers (2025-01-13T17:56:39Z) - GDSG: Graph Diffusion-based Solution Generator for Optimization Problems in MEC Networks [109.17835015018532]
We present a Graph Diffusion-based Solution Generation (GDSG) method.
This approach is designed to work with suboptimal datasets while converging to the optimal solution large probably.
We build GDSG as a multi-task diffusion model utilizing a Graph Neural Network (GNN) to acquire the distribution of high-quality solutions.
arXiv Detail & Related papers (2024-12-11T11:13:43Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over
Graphs [54.08445874064361]
We propose to solve a regularized distributionally robust learning problem in the decentralized setting.
By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem.
We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
arXiv Detail & Related papers (2022-08-29T18:01:42Z) - Adaptive Stochastic Gradient Descent for Fast and
Communication-Efficient Distributed Learning [33.590006101071765]
We consider the setting where a master wants to run a distributed descent (SGD) algorithm on $n$ workers.
We show that the adaptive version of distributed SGD can reach lower error values in less time compared to non-adaptive implementations.
arXiv Detail & Related papers (2022-08-04T10:57:25Z) - A Communication-efficient Algorithm with Linear Convergence for
Federated Minimax Learning [1.713291434132985]
We study a large-scale multi-agent minimax optimization problem, which models Geneimation Adversarial Networks (GANs)
The overall objective is a sum of agents' private local objective functions.
We show that FedGDA-GT converges linearly with a constant stepsize to global $epsilon GDA solution.
arXiv Detail & Related papers (2022-06-02T16:31:16Z) - Adaptive Periodic Averaging: A Practical Approach to Reducing
Communication in Distributed Learning [6.370766463380455]
We show that the optimal averaging period in terms of convergence and communication cost is not a constant, but instead varies over the course of the execution.
We propose a practical algorithm, named ADaptive Periodic parameter averaging SGD (ADPSGD), to achieve a smaller overall variance of model parameters.
arXiv Detail & Related papers (2020-07-13T00:04:55Z) - A Unified Theory of Decentralized SGD with Changing Topology and Local
Updates [70.9701218475002]
We introduce a unified convergence analysis of decentralized communication methods.
We derive universal convergence rates for several applications.
Our proofs rely on weak assumptions.
arXiv Detail & Related papers (2020-03-23T17:49:15Z) - Variance Reduced Local SGD with Lower Communication Complexity [52.44473777232414]
We propose Variance Reduced Local SGD to further reduce the communication complexity.
VRL-SGD achieves a emphlinear iteration speedup with a lower communication complexity $O(Tfrac12 Nfrac32)$ even if workers access non-identical datasets.
arXiv Detail & Related papers (2019-12-30T08:15:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.