A Converting Autoencoder Toward Low-latency and Energy-efficient DNN
Inference at the Edge
- URL: http://arxiv.org/abs/2403.07036v1
- Date: Mon, 11 Mar 2024 08:13:42 GMT
- Title: A Converting Autoencoder Toward Low-latency and Energy-efficient DNN
Inference at the Edge
- Authors: Hasanul Mahmud, Peng Kang, Kevin Desai, Palden Lama, Sushil Prasad
- Abstract summary: We present CBNet, a low-latency and energy-efficient deep neural network (DNN) inference framework tailored for edge devices.
It utilizes a "converting" autoencoder to efficiently transform hard images into easy ones.
CBNet achieves up to 4.8x speedup in inference latency and 79% reduction in energy usage compared to competing techniques.
- Score: 4.11949030493552
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reducing inference time and energy usage while maintaining prediction
accuracy has become a significant concern for deep neural networks (DNN)
inference on resource-constrained edge devices. To address this problem, we
propose a novel approach based on "converting" autoencoder and lightweight
DNNs. This improves upon recent work such as early-exiting framework and DNN
partitioning. Early-exiting frameworks spend different amounts of computation
power for different input data depending upon their complexity. However, they
can be inefficient in real-world scenarios that deal with many hard image
samples. On the other hand, DNN partitioning algorithms that utilize the
computation power of both the cloud and edge devices can be affected by network
delays and intermittent connections between the cloud and the edge. We present
CBNet, a low-latency and energy-efficient DNN inference framework tailored for
edge devices. It utilizes a "converting" autoencoder to efficiently transform
hard images into easy ones, which are subsequently processed by a lightweight
DNN for inference. To the best of our knowledge, such autoencoder has not been
proposed earlier. Our experimental results using three popular
image-classification datasets on a Raspberry Pi 4, a Google Cloud instance, and
an instance with Nvidia Tesla K80 GPU show that CBNet achieves up to 4.8x
speedup in inference latency and 79% reduction in energy usage compared to
competing techniques while maintaining similar or higher accuracy.
Related papers
- MatchNAS: Optimizing Edge AI in Sparse-Label Data Contexts via
Automating Deep Neural Network Porting for Mobile Deployment [54.77943671991863]
MatchNAS is a novel scheme for porting Deep Neural Networks to mobile devices.
We optimise a large network family using both labelled and unlabelled data.
We then automatically search for tailored networks for different hardware platforms.
arXiv Detail & Related papers (2024-02-21T04:43:12Z) - I-SplitEE: Image classification in Split Computing DNNs with Early Exits [5.402030962296633]
Large size of Deep Neural Networks (DNNs) hinders deploying them on resource-constrained devices like edge, mobile, and IoT platforms.
Our work presents an innovative unified approach merging early exits and split computing.
I-SplitEE is an online unsupervised algorithm ideal for scenarios lacking ground truths and with sequential data.
arXiv Detail & Related papers (2024-01-19T07:44:32Z) - Attention-based Feature Compression for CNN Inference Offloading in Edge
Computing [93.67044879636093]
This paper studies the computational offloading of CNN inference in device-edge co-inference systems.
We propose a novel autoencoder-based CNN architecture (AECNN) for effective feature extraction at end-device.
Experiments show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss.
arXiv Detail & Related papers (2022-11-24T18:10:01Z) - DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos [16.644938608211202]
Convolutional neural network inference on video data requires powerful hardware for real-time processing.
We present a sparse convolutional neural network framework that enables sparse frame-by-frame updates.
We are the first to significantly outperform the dense reference, cuDNN, in practical settings, achieving speedups of up to 7x with only marginal differences in accuracy.
arXiv Detail & Related papers (2022-03-08T10:54:00Z) - Weightless Neural Networks for Efficient Edge Inference [1.7882696915798877]
Weightless Neural Networks (WNNs) are a class of machine learning model which use table lookups to perform inference.
We propose a novel WNN architecture, BTHOWeN, with key algorithmic and architectural improvements over prior work.
BTHOWeN targets the large and growing edge computing sector by providing superior latency and energy efficiency.
arXiv Detail & Related papers (2022-03-03T01:46:05Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - EffCNet: An Efficient CondenseNet for Image Classification on NXP
BlueBox [0.0]
Edge devices offer limited processing power due to their inexpensive hardware, and limited cooling and computational resources.
We propose a novel deep convolutional neural network architecture called EffCNet for edge devices.
arXiv Detail & Related papers (2021-11-28T21:32:31Z) - Early-exit deep neural networks for distorted images: providing an
efficient edge offloading [69.43216268165402]
Edge offloading for deep neural networks (DNNs) can be adaptive to the input's complexity.
We introduce expert side branches trained on a particular distortion type to improve against image distortion.
This approach increases the estimated accuracy on the edge, improving the offloading decisions.
arXiv Detail & Related papers (2021-08-20T19:52:55Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Dynamic DNN Decomposition for Lossless Synergistic Inference [0.9549013615433989]
Deep neural networks (DNNs) sustain high performance in today's data processing applications.
We propose D3, a dynamic DNN decomposition system for synergistic inference without precision loss.
D3 outperforms the state-of-the-art counterparts up to 3.4 times in end-to-end DNN inference time and reduces backbone network communication overhead up to 3.68 times.
arXiv Detail & Related papers (2021-01-15T03:18:53Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.