DISTREAL: Distributed Resource-Aware Learning in Heterogeneous Systems
- URL: http://arxiv.org/abs/2112.08761v1
- Date: Thu, 16 Dec 2021 10:15:31 GMT
- Title: DISTREAL: Distributed Resource-Aware Learning in Heterogeneous Systems
- Authors: Martin Rapp, Ramin Khalili, Kilian Pfeiffer, J\"org Henkel
- Abstract summary: We study the problem of distributed training of neural networks (NNs) on devices with heterogeneous, limited, and time-varying availability of computational resources.
We present an adaptive, resource-aware, on-device learning mechanism, DISTREAL, which is able to fully and efficiently utilize the available resources.
- Score: 2.1506382989223782
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of distributed training of neural networks (NNs) on
devices with heterogeneous, limited, and time-varying availability of
computational resources. We present an adaptive, resource-aware, on-device
learning mechanism, DISTREAL, which is able to fully and efficiently utilize
the available resources on devices in a distributed manner, increasing the
convergence speed. This is achieved with a dropout mechanism that dynamically
adjusts the computational complexity of training an NN by randomly dropping
filters of convolutional layers of the model. Our main contribution is the
introduction of a design space exploration (DSE) technique, which finds
Pareto-optimal per-layer dropout vectors with respect to resource requirements
and convergence speed of the training. Applying this technique, each device is
able to dynamically select the dropout vector that fits its available resource
without requiring any assistance from the server. We implement our solution in
a federated learning (FL) system, where the availability of computational
resources varies both between devices and over time, and show through extensive
evaluation that we are able to significantly increase the convergence speed
over the state of the art without compromising on the final accuracy.
Related papers
- DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach [49.56404236394601]
We formulate the problem of joint DNN partitioning, task offloading, and resource allocation in Vehicular Edge Computing.
Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time.
We propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models.
arXiv Detail & Related papers (2024-06-11T06:31:03Z) - Enabling Resource-efficient AIoT System with Cross-level Optimization: A
survey [20.360136850102833]
This survey aims to provide a broader optimization space for more free resource-performance tradeoffs.
By consolidating problems and techniques scattered over diverse levels, we aim to help readers understand their connections and stimulate further discussions.
arXiv Detail & Related papers (2023-09-27T08:04:24Z) - Time-sensitive Learning for Heterogeneous Federated Edge Intelligence [52.83633954857744]
We investigate real-time machine learning in a federated edge intelligence (FEI) system.
FEI systems exhibit heterogenous communication and computational resource distribution.
We propose a time-sensitive federated learning (TS-FL) framework to minimize the overall run-time for collaboratively training a shared ML model.
arXiv Detail & Related papers (2023-01-26T08:13:22Z) - Multi-Resource Allocation for On-Device Distributed Federated Learning
Systems [79.02994855744848]
This work poses a distributed multi-resource allocation scheme for minimizing the weighted sum of latency and energy consumption in the on-device distributed federated learning (FL) system.
Each mobile device in the system engages the model training process within the specified area and allocates its computation and communication resources for deriving and uploading parameters, respectively.
arXiv Detail & Related papers (2022-11-01T14:16:05Z) - Computational Intelligence and Deep Learning for Next-Generation
Edge-Enabled Industrial IoT [51.68933585002123]
We investigate how to deploy computational intelligence and deep learning (DL) in edge-enabled industrial IoT networks.
In this paper, we propose a novel multi-exit-based federated edge learning (ME-FEEL) framework.
In particular, the proposed ME-FEEL can achieve an accuracy gain up to 32.7% in the industrial IoT networks with the severely limited resources.
arXiv Detail & Related papers (2021-10-28T08:14:57Z) - LCS: Learning Compressible Subspaces for Adaptive Network Compression at
Inference Time [57.52251547365967]
We propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models.
We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity.
Our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.
arXiv Detail & Related papers (2021-10-08T17:03:34Z) - A Heuristically Assisted Deep Reinforcement Learning Approach for
Network Slice Placement [0.7885276250519428]
We introduce a hybrid placement solution based on Deep Reinforcement Learning (DRL) and a dedicated optimization based on the Power of Two Choices principle.
The proposed Heuristically-Assisted DRL (HA-DRL) allows to accelerate the learning process and gain in resource usage when compared against other state-of-the-art approaches.
arXiv Detail & Related papers (2021-05-14T10:04:17Z) - Fast-Convergent Federated Learning [82.32029953209542]
Federated learning is a promising solution for distributing machine learning tasks through modern networks of mobile devices.
We propose a fast-convergent federated learning algorithm, called FOLB, which performs intelligent sampling of devices in each round of model training.
arXiv Detail & Related papers (2020-07-26T14:37:51Z) - Distributed Learning on Heterogeneous Resource-Constrained Devices [3.6187468775839373]
We consider a distributed system, consisting of a heterogeneous set of devices, ranging from low-end to high-end.
We propose the first approach that enables distributed learning in such a heterogeneous system.
Applying our approach, each device employs a neural network (NN) with a topology that fits its capabilities; however, part of these NNs share the same topology, so that their parameters can be jointly learned.
arXiv Detail & Related papers (2020-06-09T16:58:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.