Related papers: Parallel Split Learning with Global Sampling

Parallel Split Learning with Global Sampling

URL: http://arxiv.org/abs/2407.15738v2
Date: Thu, 8 Aug 2024 21:45:57 GMT
Title: Parallel Split Learning with Global Sampling
Authors: Mohammad Kohankhaki, Ahmad Ayad, Mahdi Barhoush, Anke Schmeink,
Abstract summary: parallel split learning has emerged as a promising derivative of split learning well suited for distributed learning on resource-constrained devices. These challenges include large effective batch sizes, non-independent and identically distributed data, and the straggler effect. We introduce a new method called uniform global sampling to decouple the effective batch size from the number of clients and reduce the mini-batch deviation. Our simulations reveal that our proposed methods enhance model accuracy by up to 34.1% in non-independent and identically distributed settings and reduce the training time in the presence of stragglers by up to 62%.
Score: 9.57839529462706
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The expansion of IoT devices and the demands of deep learning have highlighted significant challenges in distributed deep learning systems. Parallel split learning has emerged as a promising derivative of split learning well suited for distributed learning on resource-constrained devices. However, parallel split learning faces several challenges, such as large effective batch sizes, non-independent and identically distributed data, and the straggler effect. We view these issues as a sampling dilemma and propose to address them by orchestrating a mini-batch sampling process on the server side. We introduce a new method called uniform global sampling to decouple the effective batch size from the number of clients and reduce the mini-batch deviation. To address the straggler effect, we introduce a novel method called Latent Dirichlet Sampling, which generalizes uniform global sampling to balance the trade-off between batch deviation and training time. Our simulations reveal that our proposed methods enhance model accuracy by up to 34.1% in non-independent and identically distributed settings and reduce the training time in the presence of stragglers by up to 62%. In particular, Latent Dirichlet Sampling effectively mitigates the straggler effect without compromising model accuracy or adding significant computational overhead compared to uniform global sampling. Our results demonstrate the potential of our methods to mitigate common challenges in parallel split learning.

Related papers

Aioli: A Unified Optimization Framework for Language Model Data Mixing [74.50480703834508]
We show that no existing method consistently outperforms a simple stratified sampling baseline in terms of average test perplexity per group. We derive a new online method named Aioli, which directly estimates the mixing law parameters throughout training and uses them to dynamically adjust proportions.
arXiv Detail & Related papers (2024-11-08T17:50:24Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
Depersonalized Federated Learning: Tackling Statistical Heterogeneity by Alternating Stochastic Gradient Descent [6.394263208820851]
Federated learning (FL) enables devices to train a common machine learning (ML) model for intelligent inference without data sharing. Raw data held by various cooperativelyicipators are always non-identically distributedly. We propose a new FL that can significantly statistical optimize by the de-speed of this process.
arXiv Detail & Related papers (2022-10-07T10:30:39Z)
Pairwise Learning via Stagewise Training in Proximal Setting [0.0]
We combine adaptive sample size and importance sampling techniques for pairwise learning, with convergence guarantees for nonsmooth convex pairwise loss functions. We demonstrate that sampling opposite instances at each reduces the variance of the gradient, hence accelerating convergence.
arXiv Detail & Related papers (2022-08-08T11:51:01Z)
Causal Balancing for Domain Generalization [95.97046583437145]
We propose a balanced mini-batch sampling strategy to reduce the domain-specific spurious correlations in observed training distributions. We provide an identifiability guarantee of the source of spuriousness and show that our proposed approach provably samples from a balanced, spurious-free distribution.
arXiv Detail & Related papers (2022-06-10T17:59:11Z)
Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios. We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
Unrolling Particles: Unsupervised Learning of Sampling Distributions [102.72972137287728]
Particle filtering is used to compute good nonlinear estimates of complex systems. We show in simulations that the resulting particle filter yields good estimates in a wide range of scenarios.
arXiv Detail & Related papers (2021-10-06T16:58:34Z)
Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency) Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z)
Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning. Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch. ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z)
Robust Federated Learning: The Case of Affine Distribution Shifts [41.27887358989414]
We develop a robust federated learning algorithm that achieves satisfactory performance against distribution shifts in users' samples. We show that an affine distribution shift indeed suffices to significantly decrease the performance of the learnt classifier in a new test user.
arXiv Detail & Related papers (2020-06-16T03:43:59Z)
Imbalanced Data Learning by Minority Class Augmentation using Capsule Adversarial Networks [31.073558420480964]
We propose a method to restore the balance in imbalanced images, by coalescing two concurrent methods. In our model, generative and discriminative networks play a novel competitive game. The coalescing of capsule-GAN is effective at recognizing highly overlapping classes with much fewer parameters compared with the convolutional-GAN.
arXiv Detail & Related papers (2020-04-05T12:36:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.