Energy-efficient Training of Distributed DNNs in the Mobile-edge-cloud
Continuum
- URL: http://arxiv.org/abs/2202.11349v1
- Date: Wed, 23 Feb 2022 08:35:41 GMT
- Title: Energy-efficient Training of Distributed DNNs in the Mobile-edge-cloud
Continuum
- Authors: Francesco Malandrino and Carla Fabiana Chiasserini and Giuseppe Di
Giacomo
- Abstract summary: We address distributed machine learning in multi-tier networks where a heterogeneous set of nodes cooperate to perform a learning task.
We propose a solution concept, called RightTrain, that achieves energy-efficient ML model training, while fulfilling learning time and quality requirements.
Our performance evaluation shows that RightTrain closely matches the optimum and outperforms the state of the art by over 50%.
- Score: 18.247181241860538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address distributed machine learning in multi-tier (e.g.,
mobile-edge-cloud) networks where a heterogeneous set of nodes cooperate to
perform a learning task. Due to the presence of multiple data sources and
computation-capable nodes, a learning controller (e.g., located in the edge)
has to make decisions about (i) which distributed ML model structure to select,
(ii) which data should be used for the ML model training, and (iii) which
resources should be allocated to it. Since these decisions deeply influence one
another, they should be made jointly. In this paper, we envision a new approach
to distributed learning in multi-tier networks, which aims at maximizing ML
efficiency. To this end, we propose a solution concept, called RightTrain, that
achieves energy-efficient ML model training, while fulfilling learning time and
quality requirements. RightTrain makes high-quality decisions in polynomial
time. Further, our performance evaluation shows that RightTrain closely matches
the optimum and outperforms the state of the art by over 50%.
Related papers
- Prioritizing Modalities: Flexible Importance Scheduling in Federated Multimodal Learning [5.421492821020181]
Federated Learning (FL) is a distributed machine learning approach that enables devices to collaboratively train models without sharing their local data.
Applying FL to real-world data presents challenges, particularly as most existing FL research focuses on unimodal data.
We propose FlexMod, a novel approach to enhance computational efficiency in MFL by adaptively allocating training resources for each modality encoder.
arXiv Detail & Related papers (2024-08-13T01:14:27Z) - Dependable Distributed Training of Compressed Machine Learning Models [16.403297089086042]
We propose DepL, a framework for dependable learning orchestration.
It makes high-quality, efficient decisions on (i) the data to leverage for learning, (ii) the models to use and when to switch among them, and (iii) the clusters of nodes, and the resources thereof, to exploit.
We prove that DepL has constant competitive ratio and complexity, and show that it outperforms the state-of-the-art by over 27%.
arXiv Detail & Related papers (2024-02-22T07:24:26Z) - Toward efficient resource utilization at edge nodes in federated learning [0.6990493129893112]
Federated learning enables edge nodes to collaboratively contribute to constructing a global model without sharing their data.
computational resource constraints and network communication can become a severe bottleneck for larger model sizes typical for deep learning applications.
We propose and evaluate a FL strategy inspired by transfer learning in order to reduce resource utilization on devices.
arXiv Detail & Related papers (2023-09-19T07:04:50Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Partitioning Distributed Compute Jobs with Reinforcement Learning and
Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields.
Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices.
We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z) - Matching DNN Compression and Cooperative Training with Resources and
Data Availability [20.329698347331075]
How much and when an ML model should be compressed, and em where its training should be executed, are hard decisions to make.
We model the network system focusing on the training of DNNs, formalize the multi-dimensional problem, and formulate an approximate dynamic programming problem.
We prove that PACT's solutions can get as close to the optimum as desired, at the cost of an increased time complexity.
arXiv Detail & Related papers (2022-12-02T09:52:18Z) - Parallel Successive Learning for Dynamic Distributed Model Training over
Heterogeneous Wireless Networks [50.68446003616802]
Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices.
We develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions.
Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning.
arXiv Detail & Related papers (2022-02-07T05:11:01Z) - UAV-assisted Online Machine Learning over Multi-Tiered Networks: A
Hierarchical Nested Personalized Federated Learning Approach [25.936914508952086]
We consider distributed machine learning (ML) through unmanned aerial vehicles (UAVs) for geo-distributed device clusters.
We propose five new technologies/techniques: (i) stratified UAV swarms with leader, worker, and coordinator UAVs, (ii) hierarchical nested personalized federated learning (HN-PFL), and (iii) cooperative UAV resource pooling for distributed ML using the UAVs' local computational capabilities.
arXiv Detail & Related papers (2021-06-29T21:40:28Z) - Toward Multiple Federated Learning Services Resource Sharing in Mobile
Edge Networks [88.15736037284408]
We study a new model of multiple federated learning services at the multi-access edge computing server.
We propose a joint resource optimization and hyper-learning rate control problem, namely MS-FEDL.
Our simulation results demonstrate the convergence performance of our proposed algorithms.
arXiv Detail & Related papers (2020-11-25T01:29:41Z) - Dif-MAML: Decentralized Multi-Agent Meta-Learning [54.39661018886268]
We propose a cooperative multi-agent meta-learning algorithm, referred to as MAML or Dif-MAML.
We show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML.
Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.
arXiv Detail & Related papers (2020-10-06T16:51:09Z) - Adaptive Serverless Learning [114.36410688552579]
We propose a novel adaptive decentralized training approach, which can compute the learning rate from data dynamically.
Our theoretical results reveal that the proposed algorithm can achieve linear speedup with respect to the number of workers.
To reduce the communication-efficient overhead, we further propose a communication-efficient adaptive decentralized training approach.
arXiv Detail & Related papers (2020-08-24T13:23:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.