Related papers: A Converting Autoencoder Toward Low-latency and Energy-efficient DNN Inference at the Edge

A Converting Autoencoder Toward Low-latency and Energy-efficient DNN Inference at the Edge

URL: http://arxiv.org/abs/2403.07036v1
Date: Mon, 11 Mar 2024 08:13:42 GMT
Title: A Converting Autoencoder Toward Low-latency and Energy-efficient DNN Inference at the Edge
Authors: Hasanul Mahmud, Peng Kang, Kevin Desai, Palden Lama, Sushil Prasad
Abstract summary: We present CBNet, a low-latency and energy-efficient deep neural network (DNN) inference framework tailored for edge devices. It utilizes a "converting" autoencoder to efficiently transform hard images into easy ones. CBNet achieves up to 4.8x speedup in inference latency and 79% reduction in energy usage compared to competing techniques.
Score: 4.11949030493552
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reducing inference time and energy usage while maintaining prediction accuracy has become a significant concern for deep neural networks (DNN) inference on resource-constrained edge devices. To address this problem, we propose a novel approach based on "converting" autoencoder and lightweight DNNs. This improves upon recent work such as early-exiting framework and DNN partitioning. Early-exiting frameworks spend different amounts of computation power for different input data depending upon their complexity. However, they can be inefficient in real-world scenarios that deal with many hard image samples. On the other hand, DNN partitioning algorithms that utilize the computation power of both the cloud and edge devices can be affected by network delays and intermittent connections between the cloud and the edge. We present CBNet, a low-latency and energy-efficient DNN inference framework tailored for edge devices. It utilizes a "converting" autoencoder to efficiently transform hard images into easy ones, which are subsequently processed by a lightweight DNN for inference. To the best of our knowledge, such autoencoder has not been proposed earlier. Our experimental results using three popular image-classification datasets on a Raspberry Pi 4, a Google Cloud instance, and an instance with Nvidia Tesla K80 GPU show that CBNet achieves up to 4.8x speedup in inference latency and 79% reduction in energy usage compared to competing techniques while maintaining similar or higher accuracy.

Related papers

Rapid Salient Object Detection with Difference Convolutional Neural Networks [49.838283141381716]
This paper addresses the challenge of deploying salient object detection (SOD) on resource-constrained devices with real-time performance.<n>We propose an efficient network design that combines traditional wisdom on SOD and the representation power of modern CNNs.
arXiv Detail & Related papers (2025-07-01T20:41:05Z)
MatchNAS: Optimizing Edge AI in Sparse-Label Data Contexts via Automating Deep Neural Network Porting for Mobile Deployment [54.77943671991863]
MatchNAS is a novel scheme for porting Deep Neural Networks to mobile devices. We optimise a large network family using both labelled and unlabelled data. We then automatically search for tailored networks for different hardware platforms.
arXiv Detail & Related papers (2024-02-21T04:43:12Z)
I-SplitEE: Image classification in Split Computing DNNs with Early Exits [5.402030962296633]
Large size of Deep Neural Networks (DNNs) hinders deploying them on resource-constrained devices like edge, mobile, and IoT platforms. Our work presents an innovative unified approach merging early exits and split computing. I-SplitEE is an online unsupervised algorithm ideal for scenarios lacking ground truths and with sequential data.
arXiv Detail & Related papers (2024-01-19T07:44:32Z)
Attention-based Feature Compression for CNN Inference Offloading in Edge Computing [93.67044879636093]
This paper studies the computational offloading of CNN inference in device-edge co-inference systems. We propose a novel autoencoder-based CNN architecture (AECNN) for effective feature extraction at end-device. Experiments show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss.
arXiv Detail & Related papers (2022-11-24T18:10:01Z)
DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos [16.644938608211202]
Convolutional neural network inference on video data requires powerful hardware for real-time processing. We present a sparse convolutional neural network framework that enables sparse frame-by-frame updates. We are the first to significantly outperform the dense reference, cuDNN, in practical settings, achieving speedups of up to 7x with only marginal differences in accuracy.
arXiv Detail & Related papers (2022-03-08T10:54:00Z)
Weightless Neural Networks for Efficient Edge Inference [1.7882696915798877]
Weightless Neural Networks (WNNs) are a class of machine learning model which use table lookups to perform inference. We propose a novel WNN architecture, BTHOWeN, with key algorithmic and architectural improvements over prior work. BTHOWeN targets the large and growing edge computing sector by providing superior latency and energy efficiency.
arXiv Detail & Related papers (2022-03-03T01:46:05Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
EffCNet: An Efficient CondenseNet for Image Classification on NXP BlueBox [0.0]
Edge devices offer limited processing power due to their inexpensive hardware, and limited cooling and computational resources. We propose a novel deep convolutional neural network architecture called EffCNet for edge devices.
arXiv Detail & Related papers (2021-11-28T21:32:31Z)
Early-exit deep neural networks for distorted images: providing an efficient edge offloading [69.43216268165402]
Edge offloading for deep neural networks (DNNs) can be adaptive to the input's complexity. We introduce expert side branches trained on a particular distortion type to improve against image distortion. This approach increases the estimated accuracy on the edge, improving the offloading decisions.
arXiv Detail & Related papers (2021-08-20T19:52:55Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
Dynamic DNN Decomposition for Lossless Synergistic Inference [0.9549013615433989]
Deep neural networks (DNNs) sustain high performance in today's data processing applications. We propose D3, a dynamic DNN decomposition system for synergistic inference without precision loss. D3 outperforms the state-of-the-art counterparts up to 3.4 times in end-to-end DNN inference time and reduces backbone network communication overhead up to 3.68 times.
arXiv Detail & Related papers (2021-01-15T03:18:53Z)
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.