MP-SL: Multihop Parallel Split Learning
- URL: http://arxiv.org/abs/2402.00208v1
- Date: Wed, 31 Jan 2024 22:09:40 GMT
- Title: MP-SL: Multihop Parallel Split Learning
- Authors: Joana Tirana, Spyros Lalis, Dimitris Chatzopoulos
- Abstract summary: Multihop Parallel SL (MP-SL) is a modular and Machine Learning as a Service (ML) framework designed to facilitate the involvement of resource-constrained devices.
MP-SL supports multihop Parallel SL-based training. This involves splitting the model into multiple parts and utilizing multiple compute nodes in a pipelined manner.
- Score: 2.7716102039510564
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Federated Learning (FL) stands out as a widely adopted protocol facilitating
the training of Machine Learning (ML) models while maintaining decentralized
data. However, challenges arise when dealing with a heterogeneous set of
participating devices, causing delays in the training process, particularly
among devices with limited resources. Moreover, the task of training ML models
with a vast number of parameters demands computing and memory resources beyond
the capabilities of small devices, such as mobile and Internet of Things (IoT)
devices. To address these issues, techniques like Parallel Split Learning (SL)
have been introduced, allowing multiple resource-constrained devices to
actively participate in collaborative training processes with assistance from
resourceful compute nodes. Nonetheless, a drawback of Parallel SL is the
substantial memory allocation required at the compute nodes, for instance
training VGG-19 with 100 participants needs 80 GB. In this paper, we introduce
Multihop Parallel SL (MP-SL), a modular and extensible ML as a Service (MLaaS)
framework designed to facilitate the involvement of resource-constrained
devices in collaborative and distributed ML model training. Notably, to
alleviate memory demands per compute node, MP-SL supports multihop Parallel
SL-based training. This involves splitting the model into multiple parts and
utilizing multiple compute nodes in a pipelined manner. Extensive
experimentation validates MP-SL's capability to handle system heterogeneity,
demonstrating that the multihop configuration proves more efficient than
horizontally scaled one-hop Parallel SL setups, especially in scenarios
involving more cost-effective compute nodes.
Related papers
- Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach [1.297210402524609]
Split Learning partitions models at a designated cut-layer to offload compute-intensive operations to the server.
We present MPSL, a parallel SL approach for computational efficient fine-tuning of multimodal transformers in a distributed manner.
MPSL employs lightweight client-side tokenizers and a unified modality-agnostic encoder, allowing flexible adaptation to task-specific needs.
arXiv Detail & Related papers (2025-02-10T11:10:41Z) - One QuantLLM for ALL: Fine-tuning Quantized LLMs Once for Efficient Deployments [43.107261545706415]
Large Language Models (LLMs) have advanced rapidly but face significant memory demands.
Current methods typically require lengthy training to alleviate the performance degradation from quantization loss.
We make an initial attempt to extend the once-for-all framework to large language models.
arXiv Detail & Related papers (2024-05-30T16:05:15Z) - Distributed Inference and Fine-tuning of Large Language Models Over The
Internet [91.00270820533272]
Large language models (LLMs) are useful in many NLP tasks and become more capable with size.
These models require high-end hardware, making them inaccessible to most researchers.
We develop fault-tolerant inference algorithms and load-balancing protocols that automatically assign devices to maximize the total system throughput.
arXiv Detail & Related papers (2023-12-13T18:52:49Z) - Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment [8.30319294116657]
Large language models (LLMs) such as GPT-3, OPT, and LLaMA have demonstrated remarkable accuracy in a wide range of tasks.
Training these models can incur significant expenses, often requiring tens of thousands of GPU for months of continuous operation.
We introduce Holmes, a training framework for LLMs that employs thoughtfully crafted data and model parallelism strategies over the heterogeneous NIC environment.
arXiv Detail & Related papers (2023-12-06T15:27:26Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - Partitioning Distributed Compute Jobs with Reinforcement Learning and
Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields.
Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices.
We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z) - SWARM Parallelism: Training Large Models Can Be Surprisingly
Communication-Efficient [69.61083127540776]
Deep learning applications benefit from using large models with billions of parameters.
Training these models is notoriously expensive due to the need for specialized HPC clusters.
We consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions.
arXiv Detail & Related papers (2023-01-27T18:55:19Z) - PiPar: Pipeline Parallelism for Collaborative Machine Learning [16.131285496487678]
Collaborative machine learning (CML) techniques have been proposed to train deep learning models across multiple mobile devices and a server.
CML techniques are privacy-preserving as a local model that is trained on each device instead of the raw data from the device is shared with the server.
We identify idling resources on the server and devices due to sequential computation and communication as the principal cause of low resource utilization.
arXiv Detail & Related papers (2022-12-01T20:51:47Z) - Multi-Job Intelligent Scheduling with Cross-Device Federated Learning [65.69079337653994]
Federated Learning (FL) enables collaborative global machine learning model training without sharing sensitive raw data.
We propose a novel multi-job FL framework, which enables the training process of multiple jobs in parallel.
We propose a novel intelligent scheduling approach based on multiple scheduling methods, including an original reinforcement learning-based scheduling method and an original Bayesian optimization-based scheduling method.
arXiv Detail & Related papers (2022-11-24T06:17:40Z) - Asynchronous Parallel Incremental Block-Coordinate Descent for
Decentralized Machine Learning [55.198301429316125]
Machine learning (ML) is a key technique for big-data-driven modelling and analysis of massive Internet of Things (IoT) based intelligent and ubiquitous computing.
For fast-increasing applications and data amounts, distributed learning is a promising emerging paradigm since it is often impractical or inefficient to share/aggregate data.
This paper studies the problem of training an ML model over decentralized systems, where data are distributed over many user devices.
arXiv Detail & Related papers (2022-02-07T15:04:15Z) - IPLS : A Framework for Decentralized Federated Learning [6.6271520914941435]
We introduce IPLS, a fully decentralized federated learning framework that is partially based on the interplanetary file system (IPFS)
IPLS scales with the number of participants, is robust against intermittent connectivity and dynamic participant departures/arrivals, requires minimal resources, and guarantees that the accuracy of the trained model quickly converges to that of a centralized FL framework with an accuracy drop of less than one per thousand.
arXiv Detail & Related papers (2021-01-06T07:44:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.