Parallel Split Learning with Global Sampling
- URL: http://arxiv.org/abs/2407.15738v2
- Date: Thu, 8 Aug 2024 21:45:57 GMT
- Title: Parallel Split Learning with Global Sampling
- Authors: Mohammad Kohankhaki, Ahmad Ayad, Mahdi Barhoush, Anke Schmeink,
- Abstract summary: parallel split learning has emerged as a promising derivative of split learning well suited for distributed learning on resource-constrained devices.
These challenges include large effective batch sizes, non-independent and identically distributed data, and the straggler effect.
We introduce a new method called uniform global sampling to decouple the effective batch size from the number of clients and reduce the mini-batch deviation.
Our simulations reveal that our proposed methods enhance model accuracy by up to 34.1% in non-independent and identically distributed settings and reduce the training time in the presence of stragglers by up to 62%.
- Score: 9.57839529462706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The expansion of IoT devices and the demands of deep learning have highlighted significant challenges in distributed deep learning systems. Parallel split learning has emerged as a promising derivative of split learning well suited for distributed learning on resource-constrained devices. However, parallel split learning faces several challenges, such as large effective batch sizes, non-independent and identically distributed data, and the straggler effect. We view these issues as a sampling dilemma and propose to address them by orchestrating a mini-batch sampling process on the server side. We introduce a new method called uniform global sampling to decouple the effective batch size from the number of clients and reduce the mini-batch deviation. To address the straggler effect, we introduce a novel method called Latent Dirichlet Sampling, which generalizes uniform global sampling to balance the trade-off between batch deviation and training time. Our simulations reveal that our proposed methods enhance model accuracy by up to 34.1% in non-independent and identically distributed settings and reduce the training time in the presence of stragglers by up to 62%. In particular, Latent Dirichlet Sampling effectively mitigates the straggler effect without compromising model accuracy or adding significant computational overhead compared to uniform global sampling. Our results demonstrate the potential of our methods to mitigate common challenges in parallel split learning.
Related papers
- Modality Alignment Meets Federated Broadcasting [9.752555511824593]
Federated learning (FL) has emerged as a powerful approach to safeguard data privacy by training models across distributed edge devices without centralizing local data.
This paper introduces a novel FL framework leveraging modality alignment, where a text encoder resides on the server, and image encoders operate on local devices.
arXiv Detail & Related papers (2024-11-24T13:30:03Z) - Aioli: A Unified Optimization Framework for Language Model Data Mixing [74.50480703834508]
We show that no existing method consistently outperforms a simple stratified sampling baseline in terms of average test perplexity per group.
We derive a new online method named Aioli, which directly estimates the mixing law parameters throughout training and uses them to dynamically adjust proportions.
arXiv Detail & Related papers (2024-11-08T17:50:24Z) - Boosting Federated Learning with FedEntOpt: Mitigating Label Skew by Entropy-Based Client Selection [13.851391819710367]
Deep learning domains typically require an extensive amount of data for optimal performance.<n>FedEntOpt is designed to mitigate performance issues caused by label distribution skew.<n>It exhibits robust and superior performance in scenarios with low participation rates and client dropout.
arXiv Detail & Related papers (2024-11-02T13:31:36Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Leveraging Foundation Models to Improve Lightweight Clients in Federated
Learning [16.684749528240587]
Federated Learning (FL) is a distributed training paradigm that enables clients scattered across the world to cooperatively learn a global model without divulging confidential data.
FL faces a significant challenge in the form of heterogeneous data distributions among clients, which leads to a reduction in performance and robustness.
We introduce foundation model distillation to assist in the federated training of lightweight client models and increase their performance under heterogeneous data settings while keeping inference costs low.
arXiv Detail & Related papers (2023-11-14T19:10:56Z) - FedSampling: A Better Sampling Strategy for Federated Learning [81.85411484302952]
Federated learning (FL) is an important technique for learning models from decentralized data in a privacy-preserving way.
Existing FL methods usually uniformly sample clients for local model learning in each round.
We propose a novel data uniform sampling strategy for federated learning (FedSampling)
arXiv Detail & Related papers (2023-06-25T13:38:51Z) - Personalized Federated Learning under Mixture of Distributions [98.25444470990107]
We propose a novel approach to Personalized Federated Learning (PFL), which utilizes Gaussian mixture models (GMM) to fit the input data distributions across diverse clients.
FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification.
Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
arXiv Detail & Related papers (2023-05-01T20:04:46Z) - When to Trust Aggregated Gradients: Addressing Negative Client Sampling
in Federated Learning [41.51682329500003]
We propose a novel learning rate adaptation mechanism to adjust the server learning rate for the aggregated gradient in each round.
We make theoretical deductions to find a meaningful and robust indicator that is positively related to the optimal server learning rate.
arXiv Detail & Related papers (2023-01-25T03:52:45Z) - Integrating Local Real Data with Global Gradient Prototypes for
Classifier Re-Balancing in Federated Long-Tailed Learning [60.41501515192088]
Federated Learning (FL) has become a popular distributed learning paradigm that involves multiple clients training a global model collaboratively.
The data samples usually follow a long-tailed distribution in the real world, and FL on the decentralized and long-tailed data yields a poorly-behaved global model.
In this work, we integrate the local real data with the global gradient prototypes to form the local balanced datasets.
arXiv Detail & Related papers (2023-01-25T03:18:10Z) - Depersonalized Federated Learning: Tackling Statistical Heterogeneity by
Alternating Stochastic Gradient Descent [6.394263208820851]
Federated learning (FL) enables devices to train a common machine learning (ML) model for intelligent inference without data sharing.
Raw data held by various cooperativelyicipators are always non-identically distributedly.
We propose a new FL that can significantly statistical optimize by the de-speed of this process.
arXiv Detail & Related papers (2022-10-07T10:30:39Z) - Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated
Learning via Class-Imbalance Reduction [76.26710990597498]
We show that the class-imbalance of the grouped data from randomly selected clients can lead to significant performance degradation.
Based on our key observation, we design an efficient client sampling mechanism, i.e., Federated Class-balanced Sampling (Fed-CBS)
In particular, we propose a measure of class-imbalance and then employ homomorphic encryption to derive this measure in a privacy-preserving way.
arXiv Detail & Related papers (2022-09-30T05:42:56Z) - Pairwise Learning via Stagewise Training in Proximal Setting [0.0]
We combine adaptive sample size and importance sampling techniques for pairwise learning, with convergence guarantees for nonsmooth convex pairwise loss functions.
We demonstrate that sampling opposite instances at each reduces the variance of the gradient, hence accelerating convergence.
arXiv Detail & Related papers (2022-08-08T11:51:01Z) - FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for
Non-IID Data in Federated Learning [4.02923738318937]
Uneven distribution of local data across different edge devices (clients) results in slow model training and accuracy reduction in federated learning.
This work introduces a novel non-IID type encountered in real-world datasets, namely cluster-skew.
We propose FedDRL, a novel FL model that employs deep reinforcement learning to adaptively determine each client's impact factor.
arXiv Detail & Related papers (2022-08-04T04:24:16Z) - Causal Balancing for Domain Generalization [95.97046583437145]
We propose a balanced mini-batch sampling strategy to reduce the domain-specific spurious correlations in observed training distributions.
We provide an identifiability guarantee of the source of spuriousness and show that our proposed approach provably samples from a balanced, spurious-free distribution.
arXiv Detail & Related papers (2022-06-10T17:59:11Z) - Straggler-Resilient Personalized Federated Learning [55.54344312542944]
Federated learning allows training models from samples distributed across a large network of clients while respecting privacy and communication restrictions.
We develop a novel algorithmic procedure with theoretical speedup guarantees that simultaneously handles two of these hurdles.
Our method relies on ideas from representation learning theory to find a global common representation using all clients' data and learn a user-specific set of parameters leading to a personalized solution for each client.
arXiv Detail & Related papers (2022-06-05T01:14:46Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Unrolling Particles: Unsupervised Learning of Sampling Distributions [102.72972137287728]
Particle filtering is used to compute good nonlinear estimates of complex systems.
We show in simulations that the resulting particle filter yields good estimates in a wide range of scenarios.
arXiv Detail & Related papers (2021-10-06T16:58:34Z) - Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency)
Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z) - Exploiting Shared Representations for Personalized Federated Learning [54.65133770989836]
We propose a novel federated learning framework and algorithm for learning a shared data representation across clients and unique local heads for each client.
Our algorithm harnesses the distributed computational power across clients to perform many local-updates with respect to the low-dimensional local parameters for every update of the representation.
This result is of interest beyond federated learning to a broad class of problems in which we aim to learn a shared low-dimensional representation among data distributions.
arXiv Detail & Related papers (2021-02-14T05:36:25Z) - Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch.
ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z) - Robust Federated Learning: The Case of Affine Distribution Shifts [41.27887358989414]
We develop a robust federated learning algorithm that achieves satisfactory performance against distribution shifts in users' samples.
We show that an affine distribution shift indeed suffices to significantly decrease the performance of the learnt classifier in a new test user.
arXiv Detail & Related papers (2020-06-16T03:43:59Z) - Imbalanced Data Learning by Minority Class Augmentation using Capsule
Adversarial Networks [31.073558420480964]
We propose a method to restore the balance in imbalanced images, by coalescing two concurrent methods.
In our model, generative and discriminative networks play a novel competitive game.
The coalescing of capsule-GAN is effective at recognizing highly overlapping classes with much fewer parameters compared with the convolutional-GAN.
arXiv Detail & Related papers (2020-04-05T12:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.