Efficient Post-Training Augmentation for Adaptive Inference in
Heterogeneous and Distributed IoT Environments
- URL: http://arxiv.org/abs/2403.07957v1
- Date: Tue, 12 Mar 2024 08:27:53 GMT
- Title: Efficient Post-Training Augmentation for Adaptive Inference in
Heterogeneous and Distributed IoT Environments
- Authors: Max Sponner and Lorenzo Servadei and Bernd Waschneck and Robert Wille
and Akash Kumar
- Abstract summary: Early Exit Neural Networks (EENNs) present a solution to enhance the efficiency of neural network deployments.
We propose an automated augmentation flow that focuses on converting an existing model into an EENN.
Our framework constructs the EENN architecture, maps its subgraphs to the hardware targets, and configures its decision mechanism.
- Score: 4.343246899774834
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Early Exit Neural Networks (EENNs) present a solution to enhance the
efficiency of neural network deployments. However, creating EENNs is
challenging and requires specialized domain knowledge, due to the large amount
of additional design choices. To address this issue, we propose an automated
augmentation flow that focuses on converting an existing model into an EENN. It
performs all required design decisions for the deployment to heterogeneous or
distributed hardware targets: Our framework constructs the EENN architecture,
maps its subgraphs to the hardware targets, and configures its decision
mechanism. To the best of our knowledge, it is the first framework that is able
to perform all of these steps.
We evaluated our approach on a collection of Internet-of-Things and standard
image classification use cases. For a speech command detection task, our
solution was able to reduce the mean operations per inference by 59.67%. For an
ECG classification task, it was able to terminate all samples early, reducing
the mean inference energy by 74.9% and computations by 78.3%. On CIFAR-10, our
solution was able to achieve up to a 58.75% reduction in computations.
The search on a ResNet-152 base model for CIFAR-10 took less than nine hours
on a laptop CPU. Our proposed approach enables the creation of EENN optimized
for IoT environments and can reduce the inference cost of Deep Learning
applications on embedded and fog platforms, while also significantly reducing
the search cost - making it more accessible for scientists and engineers in
industry and research. The low search cost improves the accessibility of EENNs,
with the potential to improve the efficiency of neural networks in a wide range
of practical applications.
Related papers
- Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs.
We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z) - NACHOS: Neural Architecture Search for Hardware Constrained Early Exit
Neural Networks [6.279164022876874]
Early Exit Neural Networks (EENNs) endow astandard Deep Neural Network (DNN) with Early Exits (EECs)
This work presents Neural Architecture Search for Hardware Constrained Early Exit Neural Networks (NACHOS)
NACHOS is the first NAS framework for the design of optimal EENNs satisfying constraints on the accuracy and the number of Multiply and Accumulate (MAC) operations performed by the EENNs at inference time.
arXiv Detail & Related papers (2024-01-24T09:48:12Z) - Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision
Quantization [1.0235078178220354]
We propose an automated framework to compress Deep Neural Networks (DNNs) in a hardware-aware manner by jointly employing pruning and quantization.
Our framework achieves $39%$ average energy reduction for datasets $1.7%$ average accuracy loss and outperforms significantly the state-of-the-art approaches.
arXiv Detail & Related papers (2023-12-23T18:50:13Z) - DONNAv2 -- Lightweight Neural Architecture Search for Vision tasks [6.628409795264665]
We present the next-generation neural architecture design for computationally efficient neural architecture distillation - DONNAv2.
DONNAv2 reduces the computational cost of DONNA by 10x for the larger datasets.
To improve the quality of NAS search space, DONNAv2 leverages a block knowledge distillation filter to remove blocks with high inference costs.
arXiv Detail & Related papers (2023-09-26T04:48:50Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Optimising complexity of CNN models for resource constrained devices:
QRS detection case study [1.6822770693792823]
We propose a shallow CNN model to offer satisfactory level of performance in combination with post-processing.
In an IoMT application context, QRS-detection and R-peak localisation from ECG signal as a case study, the complexities of CNN models and post-processing were varied.
To the best of our knowledge, finding a deploy-able configuration, by incrementally increasing the CNN model complexity, and leveraging the strength of post-processing, is the first of its kind.
arXiv Detail & Related papers (2023-01-23T00:22:37Z) - FreeREA: Training-Free Evolution-based Architecture Search [17.202375422110553]
FreeREA is a custom cell-based evolution NAS algorithm that exploits an optimised combination of training-free metrics to rank architectures.
Our experiments, carried out on the common benchmarks NAS-Bench-101 and NATS-Bench, demonstrate that i) FreeREA is a fast, efficient, and effective search method for models automatic design.
arXiv Detail & Related papers (2022-06-17T11:16:28Z) - D-DARTS: Distributed Differentiable Architecture Search [75.12821786565318]
Differentiable ARchiTecture Search (DARTS) is one of the most trending Neural Architecture Search (NAS) methods.
We propose D-DARTS, a novel solution that addresses this problem by nesting several neural networks at cell-level.
arXiv Detail & Related papers (2021-08-20T09:07:01Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - ALF: Autoencoder-based Low-rank Filter-sharing for Efficient
Convolutional Neural Networks [63.91384986073851]
We propose the autoencoder-based low-rank filter-sharing technique technique (ALF)
ALF shows a reduction of 70% in network parameters, 61% in operations and 41% in execution time, with minimal loss in accuracy.
arXiv Detail & Related papers (2020-07-27T09:01:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.