Related papers: Hierarchical Training of Deep Neural Networks Using Early Exiting

Hierarchical Training of Deep Neural Networks Using Early Exiting

URL: http://arxiv.org/abs/2303.02384v4
Date: Mon, 20 May 2024 20:18:42 GMT
Title: Hierarchical Training of Deep Neural Networks Using Early Exiting
Authors: Yamin Sepehri, Pedram Pad, Ahmet Caner Yüzügüler, Pascal Frossard, L. Andrea Dunbar,
Abstract summary: Deep neural networks provide state-of-the-art accuracy for vision tasks but they require significant resources for training. Deep neural networks are trained on cloud servers far from the edge devices that acquire the data. In this study, a novel hierarchical training method for deep neural networks is proposed that uses early exits in a divided architecture between edge and cloud workers.
Score: 42.186536611404165
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks provide state-of-the-art accuracy for vision tasks but they require significant resources for training. Thus, they are trained on cloud servers far from the edge devices that acquire the data. This issue increases communication cost, runtime and privacy concerns. In this study, a novel hierarchical training method for deep neural networks is proposed that uses early exits in a divided architecture between edge and cloud workers to reduce the communication cost, training runtime and privacy concerns. The method proposes a brand-new use case for early exits to separate the backward pass of neural networks between the edge and the cloud during the training phase. We address the issues of most available methods that due to the sequential nature of the training phase, cannot train the levels of hierarchy simultaneously or they do it with the cost of compromising privacy. In contrast, our method can use both edge and cloud workers simultaneously, does not share the raw input data with the cloud and does not require communication during the backward pass. Several simulations and on-device experiments for different neural network architectures demonstrate the effectiveness of this method. It is shown that the proposed method reduces the training runtime for VGG-16 and ResNet-18 architectures by 29% and 61% in CIFAR-10 classification and by 25% and 81% in Tiny ImageNet classification when the communication with the cloud is done over a low bit rate channel. This gain in the runtime is achieved whilst the accuracy drop is negligible. This method is advantageous for online learning of high-accuracy deep neural networks on sensor-holding low-resource devices such as mobile phones or robots as a part of an edge-cloud system, making them more flexible in facing new tasks and classes of data.

Related papers

PriPHiT: Privacy-Preserving Hierarchical Training of Deep Neural Networks [44.0097014096626]
We propose a method to perform the training phase of a deep learning model on both an edge device and a cloud server. The proposed privacy-preserving method uses adversarial early exits to suppress the sensitive content at the edge and transmits the task-relevant information to the cloud.
arXiv Detail & Related papers (2024-08-09T14:33:34Z)
DCLP: Neural Architecture Predictor with Curriculum Contrastive Learning [5.2319020651074215]
We propose a Curricumum-guided Contrastive Learning framework for neural Predictor (DCLP) Our method simplifies the contrastive task by designing a novel curriculum to enhance the stability of unlabeled training data distribution. We experimentally demonstrate that DCLP has high accuracy and efficiency compared with existing predictors.
arXiv Detail & Related papers (2023-02-25T08:16:21Z)
Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable. In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols. Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z)
Neural Maximum A Posteriori Estimation on Unpaired Data for Motion Deblurring [87.97330195531029]
We propose a Neural Maximum A Posteriori (NeurMAP) estimation framework for training neural networks to recover blind motion information and sharp content from unpaired data. The proposed NeurMAP is an approach to existing deblurring neural networks, and is the first framework that enables training image deblurring networks on unpaired datasets.
arXiv Detail & Related papers (2022-04-26T08:09:47Z)
Neural Capacitance: A New Perspective of Neural Network Selection via Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction. We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training. Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z)
Federated Dynamic Sparse Training: Computing Less, Communicating Less, Yet Learning Better [88.28293442298015]
Federated learning (FL) enables distribution of machine learning workloads from the cloud to resource-limited edge devices. We develop, implement, and experimentally validate a novel FL framework termed Federated Dynamic Sparse Training (FedDST) FedDST is a dynamic process that extracts and trains sparse sub-networks from the target full network.
arXiv Detail & Related papers (2021-12-18T02:26:38Z)
An Experimental Study of the Impact of Pre-training on the Pruning of a Convolutional Neural Network [0.0]
In recent years, deep neural networks have known a wide success in various application domains. Deep neural networks usually involve a large number of parameters, which correspond to the weights of the network. The pruning methods notably attempt to reduce the size of the parameter set, by identifying and removing the irrelevant weights.
arXiv Detail & Related papers (2021-12-15T16:02:15Z)
CLAN: Continuous Learning using Asynchronous Neuroevolution on Commodity Edge Devices [3.812706195714961]
We build a prototype distributed system of Raspberry Pis communicating via WiFi running NeuroEvolutionary (NE) learning and inference. We evaluate the performance of such a collaborative system and detail the compute/communication characteristics of different arrangements of the system.
arXiv Detail & Related papers (2020-08-27T01:49:21Z)
A Hybrid Method for Training Convolutional Neural Networks [3.172761915061083]
We propose a hybrid method that uses both backpropagation and evolutionary strategies to train Convolutional Neural Networks. We show that the proposed hybrid method is capable of improving upon regular training in the task of image classification.
arXiv Detail & Related papers (2020-04-15T17:52:48Z)
HierTrain: Fast Hierarchical Edge AI Learning with Hybrid Parallelism in Mobile-Edge-Cloud Computing [36.40138484917463]
We propose HierTrain, a hierarchical edge AI learning framework, which efficiently deploys the DNN training task over the hierarchical MECC architecture. We show that HierTrain can achieve up to 6.9x speedup compared to the cloud-based hierarchical training approach.
arXiv Detail & Related papers (2020-03-22T12:40:06Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.