Related papers: BottleFit: Learning Compressed Representations in Deep Neural Networks for Effective and Efficient Split Computing

BottleFit: Learning Compressed Representations in Deep Neural Networks for Effective and Efficient Split Computing

URL: http://arxiv.org/abs/2201.02693v1
Date: Fri, 7 Jan 2022 22:08:07 GMT
Title: BottleFit: Learning Compressed Representations in Deep Neural Networks for Effective and Efficient Split Computing
Authors: Yoshitomo Matsubara, Davide Callegaro, Sameer Singh, Marco Levorato, Francesco Restuccia
Abstract summary: We propose a new framework called BottleFit, which includes a novel training strategy to achieve high accuracy even with strong compression rates. BottleFit achieves 77.1% data compression with up to 0.6% accuracy loss on ImageNet dataset. We show that BottleFit decreases power consumption and latency respectively by up to 49% and 89% with respect to (w.r.t.) local computing and by 37% and 55% w.r.t. edge offloading.
Score: 48.11023234245863
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although mission-critical applications require the use of deep neural networks (DNNs), their continuous execution at mobile devices results in a significant increase in energy consumption. While edge offloading can decrease energy consumption, erratic patterns in channel quality, network and edge server load can lead to severe disruption of the system's key operations. An alternative approach, called split computing, generates compressed representations within the model (called "bottlenecks"), to reduce bandwidth usage and energy consumption. Prior work has proposed approaches that introduce additional layers, to the detriment of energy consumption and latency. For this reason, we propose a new framework called BottleFit, which, in addition to targeted DNN architecture modifications, includes a novel training strategy to achieve high accuracy even with strong compression rates. We apply BottleFit on cutting-edge DNN models in image classification, and show that BottleFit achieves 77.1% data compression with up to 0.6% accuracy loss on ImageNet dataset, while state of the art such as SPINN loses up to 6% in accuracy. We experimentally measure the power consumption and latency of an image classification application running on an NVIDIA Jetson Nano board (GPU-based) and a Raspberry PI board (GPU-less). We show that BottleFit decreases power consumption and latency respectively by up to 49% and 89% with respect to (w.r.t.) local computing and by 37% and 55% w.r.t. edge offloading. We also compare BottleFit with state-of-the-art autoencoders-based approaches, and show that (i) BottleFit reduces power consumption and execution time respectively by up to 54% and 44% on the Jetson and 40% and 62% on Raspberry PI; (ii) the size of the head model executed on the mobile device is 83 times smaller. The code repository will be published for full reproducibility of the results.

Related papers

QuantU-Net: Efficient Wearable Medical Imaging Using Bitwidth as a Trainable Parameter [0.0]
We introduce QuantU-Net, a quantized version of U-Net optimized for efficient deployment on low-power devices. The model achieves an approximately 8x reduction in size, making it suitable for real-time applications in wearable medical devices.
arXiv Detail & Related papers (2025-03-10T16:25:34Z)
Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity [39.483346492111515]
Linear recurrent neural networks enable powerful long-range sequence modeling with constant memory usage and time-per-token during inference. Unstructured sparsity offers a compelling solution, enabling substantial reductions in compute and memory requirements when accelerated by compatible hardware platforms. We find that highly sparse linear RNNs consistently achieve better efficiency-performance trade-offs than dense baselines.
arXiv Detail & Related papers (2025-02-03T13:09:21Z)
A Converting Autoencoder Toward Low-latency and Energy-efficient DNN Inference at the Edge [4.11949030493552]
We present CBNet, a low-latency and energy-efficient deep neural network (DNN) inference framework tailored for edge devices. It utilizes a "converting" autoencoder to efficiently transform hard images into easy ones. CBNet achieves up to 4.8x speedup in inference latency and 79% reduction in energy usage compared to competing techniques.
arXiv Detail & Related papers (2024-03-11T08:13:42Z)
Attention-based Feature Compression for CNN Inference Offloading in Edge Computing [93.67044879636093]
This paper studies the computational offloading of CNN inference in device-edge co-inference systems. We propose a novel autoencoder-based CNN architecture (AECNN) for effective feature extraction at end-device. Experiments show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss.
arXiv Detail & Related papers (2022-11-24T18:10:01Z)
Pushing the Limits of Asynchronous Graph-based Object Detection with Event Cameras [62.70541164894224]
We introduce several architecture choices which allow us to scale the depth and complexity of such models while maintaining low computation. Our method runs 3.7 times faster than a dense graph neural network, taking only 8.4 ms per forward pass.
arXiv Detail & Related papers (2022-11-22T15:14:20Z)
Post-training deep neural network pruning via layer-wise calibration [70.65691136625514]
We propose a data-free extension of the approach for computer vision models based on automatically-generated synthetic fractal images. When using real data, we are able to get a ResNet50 model on ImageNet with 65% sparsity rate in 8-bit precision in a post-training setting.
arXiv Detail & Related papers (2021-04-30T14:20:51Z)
Toward Compact Deep Neural Networks via Energy-Aware Pruning [2.578242050187029]
We propose a novel energy-aware pruning method that quantifies the importance of each filter in the network using nuclear-norm (NN) We achieve competitive results with 40.4/49.8% of FLOPs and 45.9/52.9% of parameter reduction with 94.13/94.61% in the Top-1 accuracy with ResNet-56/110 on CIFAR-10.
arXiv Detail & Related papers (2021-03-19T15:33:16Z)
FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks. Current networks often occupy large number of parameters and require heavy computation costs. Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z)
AdderNet and its Minimalist Hardware Design for Energy-Efficient Artificial Intelligence [111.09105910265154]
We present a novel minimalist hardware architecture using adder convolutional neural network (AdderNet) The whole AdderNet can practically achieve 16% enhancement in speed. We conclude the AdderNet is able to surpass all the other competitors.
arXiv Detail & Related papers (2021-01-25T11:31:52Z)
Sound Event Detection with Binary Neural Networks on Tightly Power-Constrained IoT Devices [20.349809458335532]
Sound event detection (SED) is a hot topic in consumer and smart city applications. Existing approaches based on Deep Neural Networks are very effective, but highly demanding in terms of memory, power, and throughput. In this paper, we explore the combination of extreme quantization to a small-print binary neural network (BNN) with the highly energy-efficient, RISC-V-based (8+1)-core GAP8 microcontroller.
arXiv Detail & Related papers (2021-01-12T12:38:23Z)
Efficient CNN-LSTM based Image Captioning using Neural Network Compression [0.0]
We present an unconventional end to end compression pipeline of a CNN-LSTM based Image Captioning model. We then examine the effects of different compression architectures on the model and design a compression architecture that achieves a 73.1% reduction in model size.
arXiv Detail & Related papers (2020-12-17T16:25:09Z)
PENNI: Pruned Kernel Sharing for Efficient CNN Inference [41.050335599000036]
State-of-the-art (SOTA) CNNs achieve outstanding performance on various tasks. Their high computation demand and massive number of parameters make it difficult to deploy these SOTA CNNs onto resource-constrained devices. We propose PENNI, a CNN model compression framework that is able to achieve model compactness and hardware efficiency simultaneously.
arXiv Detail & Related papers (2020-05-14T16:57:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.