Related papers: Energy-efficient Training of Distributed DNNs in the Mobile-edge-cloud Continuum

Energy-efficient Training of Distributed DNNs in the Mobile-edge-cloud Continuum

URL: http://arxiv.org/abs/2202.11349v1
Date: Wed, 23 Feb 2022 08:35:41 GMT
Title: Energy-efficient Training of Distributed DNNs in the Mobile-edge-cloud Continuum
Authors: Francesco Malandrino and Carla Fabiana Chiasserini and Giuseppe Di Giacomo
Abstract summary: We address distributed machine learning in multi-tier networks where a heterogeneous set of nodes cooperate to perform a learning task. We propose a solution concept, called RightTrain, that achieves energy-efficient ML model training, while fulfilling learning time and quality requirements. Our performance evaluation shows that RightTrain closely matches the optimum and outperforms the state of the art by over 50%.
Score: 18.247181241860538
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We address distributed machine learning in multi-tier (e.g., mobile-edge-cloud) networks where a heterogeneous set of nodes cooperate to perform a learning task. Due to the presence of multiple data sources and computation-capable nodes, a learning controller (e.g., located in the edge) has to make decisions about (i) which distributed ML model structure to select, (ii) which data should be used for the ML model training, and (iii) which resources should be allocated to it. Since these decisions deeply influence one another, they should be made jointly. In this paper, we envision a new approach to distributed learning in multi-tier networks, which aims at maximizing ML efficiency. To this end, we propose a solution concept, called RightTrain, that achieves energy-efficient ML model training, while fulfilling learning time and quality requirements. RightTrain makes high-quality decisions in polynomial time. Further, our performance evaluation shows that RightTrain closely matches the optimum and outperforms the state of the art by over 50%.

Related papers

Prioritizing Modalities: Flexible Importance Scheduling in Federated Multimodal Learning [5.421492821020181]
Federated Learning (FL) is a distributed machine learning approach that enables devices to collaboratively train models without sharing their local data. Applying FL to real-world data presents challenges, particularly as most existing FL research focuses on unimodal data. We propose FlexMod, a novel approach to enhance computational efficiency in MFL by adaptively allocating training resources for each modality encoder.
arXiv Detail & Related papers (2024-08-13T01:14:27Z)
Dependable Distributed Training of Compressed Machine Learning Models [16.403297089086042]
We propose DepL, a framework for dependable learning orchestration. It makes high-quality, efficient decisions on (i) the data to leverage for learning, (ii) the models to use and when to switch among them, and (iii) the clusters of nodes, and the resources thereof, to exploit. We prove that DepL has constant competitive ratio and complexity, and show that it outperforms the state-of-the-art by over 27%.
arXiv Detail & Related papers (2024-02-22T07:24:26Z)
Toward efficient resource utilization at edge nodes in federated learning [0.6990493129893112]
Federated learning enables edge nodes to collaboratively contribute to constructing a global model without sharing their data. computational resource constraints and network communication can become a severe bottleneck for larger model sizes typical for deep learning applications. We propose and evaluate a FL strategy inspired by transfer learning in order to reduce resource utilization on devices.
arXiv Detail & Related papers (2023-09-19T07:04:50Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Partitioning Distributed Compute Jobs with Reinforcement Learning and Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields. Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices. We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z)
Matching DNN Compression and Cooperative Training with Resources and Data Availability [20.329698347331075]
How much and when an ML model should be compressed, and em where its training should be executed, are hard decisions to make. We model the network system focusing on the training of DNNs, formalize the multi-dimensional problem, and formulate an approximate dynamic programming problem. We prove that PACT's solutions can get as close to the optimum as desired, at the cost of an increased time complexity.
arXiv Detail & Related papers (2022-12-02T09:52:18Z)
Parallel Successive Learning for Dynamic Distributed Model Training over Heterogeneous Wireless Networks [50.68446003616802]
Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices. We develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions. Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning.
arXiv Detail & Related papers (2022-02-07T05:11:01Z)
UAV-assisted Online Machine Learning over Multi-Tiered Networks: A Hierarchical Nested Personalized Federated Learning Approach [25.936914508952086]
We consider distributed machine learning (ML) through unmanned aerial vehicles (UAVs) for geo-distributed device clusters. We propose five new technologies/techniques: (i) stratified UAV swarms with leader, worker, and coordinator UAVs, (ii) hierarchical nested personalized federated learning (HN-PFL), and (iii) cooperative UAV resource pooling for distributed ML using the UAVs' local computational capabilities.
arXiv Detail & Related papers (2021-06-29T21:40:28Z)
Toward Multiple Federated Learning Services Resource Sharing in Mobile Edge Networks [88.15736037284408]
We study a new model of multiple federated learning services at the multi-access edge computing server. We propose a joint resource optimization and hyper-learning rate control problem, namely MS-FEDL. Our simulation results demonstrate the convergence performance of our proposed algorithms.
arXiv Detail & Related papers (2020-11-25T01:29:41Z)
Dif-MAML: Decentralized Multi-Agent Meta-Learning [54.39661018886268]
We propose a cooperative multi-agent meta-learning algorithm, referred to as MAML or Dif-MAML. We show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML. Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.
arXiv Detail & Related papers (2020-10-06T16:51:09Z)
Adaptive Serverless Learning [114.36410688552579]
We propose a novel adaptive decentralized training approach, which can compute the learning rate from data dynamically. Our theoretical results reveal that the proposed algorithm can achieve linear speedup with respect to the number of workers. To reduce the communication-efficient overhead, we further propose a communication-efficient adaptive decentralized training approach.
arXiv Detail & Related papers (2020-08-24T13:23:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.