Related papers: Efficient Post-Training Augmentation for Adaptive Inference in Heterogeneous and Distributed IoT Environments

Efficient Post-Training Augmentation for Adaptive Inference in Heterogeneous and Distributed IoT Environments

URL: http://arxiv.org/abs/2403.07957v1
Date: Tue, 12 Mar 2024 08:27:53 GMT
Title: Efficient Post-Training Augmentation for Adaptive Inference in Heterogeneous and Distributed IoT Environments
Authors: Max Sponner and Lorenzo Servadei and Bernd Waschneck and Robert Wille and Akash Kumar
Abstract summary: Early Exit Neural Networks (EENNs) present a solution to enhance the efficiency of neural network deployments. We propose an automated augmentation flow that focuses on converting an existing model into an EENN. Our framework constructs the EENN architecture, maps its subgraphs to the hardware targets, and configures its decision mechanism.
Score: 4.343246899774834
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Early Exit Neural Networks (EENNs) present a solution to enhance the efficiency of neural network deployments. However, creating EENNs is challenging and requires specialized domain knowledge, due to the large amount of additional design choices. To address this issue, we propose an automated augmentation flow that focuses on converting an existing model into an EENN. It performs all required design decisions for the deployment to heterogeneous or distributed hardware targets: Our framework constructs the EENN architecture, maps its subgraphs to the hardware targets, and configures its decision mechanism. To the best of our knowledge, it is the first framework that is able to perform all of these steps. We evaluated our approach on a collection of Internet-of-Things and standard image classification use cases. For a speech command detection task, our solution was able to reduce the mean operations per inference by 59.67%. For an ECG classification task, it was able to terminate all samples early, reducing the mean inference energy by 74.9% and computations by 78.3%. On CIFAR-10, our solution was able to achieve up to a 58.75% reduction in computations. The search on a ResNet-152 base model for CIFAR-10 took less than nine hours on a laptop CPU. Our proposed approach enables the creation of EENN optimized for IoT environments and can reduce the inference cost of Deep Learning applications on embedded and fog platforms, while also significantly reducing the search cost - making it more accessible for scientists and engineers in industry and research. The low search cost improves the accessibility of EENNs, with the potential to improve the efficiency of neural networks in a wide range of practical applications.

Related papers

Adaptive Calibration: A Unified Conversion Framework of Spiking Neural Network [1.5215973379400674]
Spiking Neural Networks (SNNs) are seen as an energy-efficient alternative to traditional Artificial Neural Networks (ANNs) We present a unified training-free conversion framework that significantly enhances both the performance and efficiency of converted SNNs.
arXiv Detail & Related papers (2024-12-18T09:38:54Z)
Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs. We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z)
EdgeOL: Efficient in-situ Online Learning on Edge Devices [51.86178757050963]
We propose EdgeOL, an edge online learning framework that optimize inference accuracy, fine-tuning execution time, and energy efficiency.<n> Experimental results show that, on average, EdgeOL reduces overall fine-tuning execution time by 64%, energy consumption by 52%, and improves average inference accuracy by 1.75% over the immediate online learning strategy.
arXiv Detail & Related papers (2024-01-30T02:41:05Z)
NACHOS: Neural Architecture Search for Hardware Constrained Early Exit Neural Networks [6.279164022876874]
Early Exit Neural Networks (EENNs) endow astandard Deep Neural Network (DNN) with Early Exits (EECs) This work presents Neural Architecture Search for Hardware Constrained Early Exit Neural Networks (NACHOS) NACHOS is the first NAS framework for the design of optimal EENNs satisfying constraints on the accuracy and the number of Multiply and Accumulate (MAC) operations performed by the EENNs at inference time.
arXiv Detail & Related papers (2024-01-24T09:48:12Z)
Hardware-Aware DNN Compression via Diverse Pruning and Mixed-Precision Quantization [1.0235078178220354]
We propose an automated framework to compress Deep Neural Networks (DNNs) in a hardware-aware manner by jointly employing pruning and quantization. Our framework achieves $39%$ average energy reduction for datasets $1.7%$ average accuracy loss and outperforms significantly the state-of-the-art approaches.
arXiv Detail & Related papers (2023-12-23T18:50:13Z)
DONNAv2 -- Lightweight Neural Architecture Search for Vision tasks [6.628409795264665]
We present the next-generation neural architecture design for computationally efficient neural architecture distillation - DONNAv2. DONNAv2 reduces the computational cost of DONNA by 10x for the larger datasets. To improve the quality of NAS search space, DONNAv2 leverages a block knowledge distillation filter to remove blocks with high inference costs.
arXiv Detail & Related papers (2023-09-26T04:48:50Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Optimising complexity of CNN models for resource constrained devices: QRS detection case study [1.6822770693792823]
We propose a shallow CNN model to offer satisfactory level of performance in combination with post-processing. In an IoMT application context, QRS-detection and R-peak localisation from ECG signal as a case study, the complexities of CNN models and post-processing were varied. To the best of our knowledge, finding a deploy-able configuration, by incrementally increasing the CNN model complexity, and leveraging the strength of post-processing, is the first of its kind.
arXiv Detail & Related papers (2023-01-23T00:22:37Z)
FreeREA: Training-Free Evolution-based Architecture Search [17.202375422110553]
FreeREA is a custom cell-based evolution NAS algorithm that exploits an optimised combination of training-free metrics to rank architectures. Our experiments, carried out on the common benchmarks NAS-Bench-101 and NATS-Bench, demonstrate that i) FreeREA is a fast, efficient, and effective search method for models automatic design.
arXiv Detail & Related papers (2022-06-17T11:16:28Z)
D-DARTS: Distributed Differentiable Architecture Search [75.12821786565318]
Differentiable ARchiTecture Search (DARTS) is one of the most trending Neural Architecture Search (NAS) methods. We propose D-DARTS, a novel solution that addresses this problem by nesting several neural networks at cell-level.
arXiv Detail & Related papers (2021-08-20T09:07:01Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS) We employ a one-shot architecture search approach in order to obtain a reduced search cost. We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)
ALF: Autoencoder-based Low-rank Filter-sharing for Efficient Convolutional Neural Networks [63.91384986073851]
We propose the autoencoder-based low-rank filter-sharing technique technique (ALF) ALF shows a reduction of 70% in network parameters, 61% in operations and 41% in execution time, with minimal loss in accuracy.
arXiv Detail & Related papers (2020-07-27T09:01:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.