Hogwild! over Distributed Local Data Sets with Linearly Increasing
Mini-Batch Sizes
- URL: http://arxiv.org/abs/2010.14763v2
- Date: Sat, 27 Feb 2021 03:53:19 GMT
- Title: Hogwild! over Distributed Local Data Sets with Linearly Increasing
Mini-Batch Sizes
- Authors: Marten van Dijk, Nhuong V. Nguyen, Toan N. Nguyen, Lam M. Nguyen, Quoc
Tran-Dinh and Phuong Ha Nguyen
- Abstract summary: Hogwild! implements asynchronous Gradient Descent where multiple threads in parallel access a common repository containing training data.
We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost.
- Score: 26.9902939745173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where
multiple threads in parallel access a common repository containing training
data, perform SGD iterations and update shared state that represents a jointly
learned (global) model. We consider big data analysis where training data is
distributed among local data sets in a heterogeneous way -- and we wish to move
SGD computations to local compute nodes where local data resides. The results
of these local SGD computations are aggregated by a central "aggregator" which
mimics Hogwild!. We show how local compute nodes can start choosing small
mini-batch sizes which increase to larger ones in order to reduce communication
cost (round interaction with the aggregator). We improve state-of-the-art
literature and show $O(\sqrt{K}$) communication rounds for heterogeneous data
for strongly convex problems, where $K$ is the total number of gradient
computations across all local compute nodes. For our scheme, we prove a
\textit{tight} and novel non-trivial convergence analysis for strongly convex
problems for {\em heterogeneous} data which does not use the bounded gradient
assumption as seen in many existing publications. The tightness is a
consequence of our proofs for lower and upper bounds of the convergence rate,
which show a constant factor difference. We show experimental results for plain
convex and non-convex problems for biased (i.e., heterogeneous) and unbiased
local data sets.
Related papers
- Locally Regularized Sparse Graph by Fast Proximal Gradient Descent [6.882546996728011]
We propose a novel Regularized Sparse Graph abbreviated SRSG.
Sparse graphs have been shown to be effective in clustering high-dimensional data.
We show that SRSG is superior to other clustering methods.
arXiv Detail & Related papers (2024-09-25T16:57:47Z) - Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator [49.87315310656657]
We introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size.
Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method.
arXiv Detail & Related papers (2024-09-08T13:08:45Z) - FedGT: Federated Node Classification with Scalable Graph Transformer [27.50698154862779]
We propose a scalable textbfFederated textbfGraph textbfTransformer (textbfFedGT) in the paper.
FedGT computes clients' similarity based on the aligned global nodes with optimal transport.
arXiv Detail & Related papers (2024-01-26T21:02:36Z) - Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations.
We study how data heterogeneity affects the representations of the globally aggregated models.
We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z) - $\texttt{FedBC}$: Calibrating Global and Local Models via Federated
Learning Beyond Consensus [66.62731854746856]
In federated learning (FL), the objective of collaboratively learning a global model through aggregation of model updates across devices tends to oppose the goal of personalization via local information.
In this work, we calibrate this tradeoff in a quantitative manner through a multi-criterion-based optimization.
We demonstrate that $texttFedBC$ balances the global and local model test accuracy metrics across a suite datasets.
arXiv Detail & Related papers (2022-06-22T02:42:04Z) - A Communication-efficient Algorithm with Linear Convergence for
Federated Minimax Learning [1.713291434132985]
We study a large-scale multi-agent minimax optimization problem, which models Geneimation Adversarial Networks (GANs)
The overall objective is a sum of agents' private local objective functions.
We show that FedGDA-GT converges linearly with a constant stepsize to global $epsilon GDA solution.
arXiv Detail & Related papers (2022-06-02T16:31:16Z) - Federated Minimax Optimization: Improved Convergence Analyses and
Algorithms [32.062312674333775]
We consider non minimax optimization, is gaining prominence many modern machine learning applications such as GANs.
We provide a novel and tighter analysis algorithm, improves convergence communication guarantees in the existing literature.
arXiv Detail & Related papers (2022-03-09T16:21:31Z) - Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD
for Communication Efficient Nonconvex Distributed Learning [58.79085525115987]
Local methods are one of the promising approaches to reduce communication time.
We show that the communication complexity is better than non-local methods when the local datasets is smaller than the smoothness local loss.
arXiv Detail & Related papers (2022-02-12T15:12:17Z) - Faster Convergence of Local SGD for Over-Parameterized Models [1.5504102675587357]
Modern machine learning architectures are often highly expressive.
We analyze the convergence of Local SGD (or FedAvg) for such over-parameterized functions in heterogeneous data setting.
For general convex loss functions, we establish an error bound $O(K/T)$ otherwise.
For non-loss functions, we prove an error bound $O(K/T)$ in both cases.
We complete our results by providing problem instances in which our established convergence rates are tight to a constant factor with a reasonably small stepsize.
arXiv Detail & Related papers (2022-01-30T04:05:56Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.