BottleFit: Learning Compressed Representations in Deep Neural Networks
for Effective and Efficient Split Computing
- URL: http://arxiv.org/abs/2201.02693v1
- Date: Fri, 7 Jan 2022 22:08:07 GMT
- Title: BottleFit: Learning Compressed Representations in Deep Neural Networks
for Effective and Efficient Split Computing
- Authors: Yoshitomo Matsubara, Davide Callegaro, Sameer Singh, Marco Levorato,
Francesco Restuccia
- Abstract summary: We propose a new framework called BottleFit, which includes a novel training strategy to achieve high accuracy even with strong compression rates.
BottleFit achieves 77.1% data compression with up to 0.6% accuracy loss on ImageNet dataset.
We show that BottleFit decreases power consumption and latency respectively by up to 49% and 89% with respect to (w.r.t.) local computing and by 37% and 55% w.r.t. edge offloading.
- Score: 48.11023234245863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although mission-critical applications require the use of deep neural
networks (DNNs), their continuous execution at mobile devices results in a
significant increase in energy consumption. While edge offloading can decrease
energy consumption, erratic patterns in channel quality, network and edge
server load can lead to severe disruption of the system's key operations. An
alternative approach, called split computing, generates compressed
representations within the model (called "bottlenecks"), to reduce bandwidth
usage and energy consumption. Prior work has proposed approaches that introduce
additional layers, to the detriment of energy consumption and latency. For this
reason, we propose a new framework called BottleFit, which, in addition to
targeted DNN architecture modifications, includes a novel training strategy to
achieve high accuracy even with strong compression rates. We apply BottleFit on
cutting-edge DNN models in image classification, and show that BottleFit
achieves 77.1% data compression with up to 0.6% accuracy loss on ImageNet
dataset, while state of the art such as SPINN loses up to 6% in accuracy. We
experimentally measure the power consumption and latency of an image
classification application running on an NVIDIA Jetson Nano board (GPU-based)
and a Raspberry PI board (GPU-less). We show that BottleFit decreases power
consumption and latency respectively by up to 49% and 89% with respect to
(w.r.t.) local computing and by 37% and 55% w.r.t. edge offloading. We also
compare BottleFit with state-of-the-art autoencoders-based approaches, and show
that (i) BottleFit reduces power consumption and execution time respectively by
up to 54% and 44% on the Jetson and 40% and 62% on Raspberry PI; (ii) the size
of the head model executed on the mobile device is 83 times smaller. The code
repository will be published for full reproducibility of the results.
Related papers
- A Converting Autoencoder Toward Low-latency and Energy-efficient DNN
Inference at the Edge [4.11949030493552]
We present CBNet, a low-latency and energy-efficient deep neural network (DNN) inference framework tailored for edge devices.
It utilizes a "converting" autoencoder to efficiently transform hard images into easy ones.
CBNet achieves up to 4.8x speedup in inference latency and 79% reduction in energy usage compared to competing techniques.
arXiv Detail & Related papers (2024-03-11T08:13:42Z) - Attention-based Feature Compression for CNN Inference Offloading in Edge
Computing [93.67044879636093]
This paper studies the computational offloading of CNN inference in device-edge co-inference systems.
We propose a novel autoencoder-based CNN architecture (AECNN) for effective feature extraction at end-device.
Experiments show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss.
arXiv Detail & Related papers (2022-11-24T18:10:01Z) - Pushing the Limits of Asynchronous Graph-based Object Detection with
Event Cameras [62.70541164894224]
We introduce several architecture choices which allow us to scale the depth and complexity of such models while maintaining low computation.
Our method runs 3.7 times faster than a dense graph neural network, taking only 8.4 ms per forward pass.
arXiv Detail & Related papers (2022-11-22T15:14:20Z) - Post-training deep neural network pruning via layer-wise calibration [70.65691136625514]
We propose a data-free extension of the approach for computer vision models based on automatically-generated synthetic fractal images.
When using real data, we are able to get a ResNet50 model on ImageNet with 65% sparsity rate in 8-bit precision in a post-training setting.
arXiv Detail & Related papers (2021-04-30T14:20:51Z) - Toward Compact Deep Neural Networks via Energy-Aware Pruning [2.578242050187029]
We propose a novel energy-aware pruning method that quantifies the importance of each filter in the network using nuclear-norm (NN)
We achieve competitive results with 40.4/49.8% of FLOPs and 45.9/52.9% of parameter reduction with 94.13/94.61% in the Top-1 accuracy with ResNet-56/110 on CIFAR-10.
arXiv Detail & Related papers (2021-03-19T15:33:16Z) - FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks.
Current networks often occupy large number of parameters and require heavy computation costs.
Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z) - AdderNet and its Minimalist Hardware Design for Energy-Efficient
Artificial Intelligence [111.09105910265154]
We present a novel minimalist hardware architecture using adder convolutional neural network (AdderNet)
The whole AdderNet can practically achieve 16% enhancement in speed.
We conclude the AdderNet is able to surpass all the other competitors.
arXiv Detail & Related papers (2021-01-25T11:31:52Z) - Sound Event Detection with Binary Neural Networks on Tightly
Power-Constrained IoT Devices [20.349809458335532]
Sound event detection (SED) is a hot topic in consumer and smart city applications.
Existing approaches based on Deep Neural Networks are very effective, but highly demanding in terms of memory, power, and throughput.
In this paper, we explore the combination of extreme quantization to a small-print binary neural network (BNN) with the highly energy-efficient, RISC-V-based (8+1)-core GAP8 microcontroller.
arXiv Detail & Related papers (2021-01-12T12:38:23Z) - Efficient CNN-LSTM based Image Captioning using Neural Network
Compression [0.0]
We present an unconventional end to end compression pipeline of a CNN-LSTM based Image Captioning model.
We then examine the effects of different compression architectures on the model and design a compression architecture that achieves a 73.1% reduction in model size.
arXiv Detail & Related papers (2020-12-17T16:25:09Z) - PENNI: Pruned Kernel Sharing for Efficient CNN Inference [41.050335599000036]
State-of-the-art (SOTA) CNNs achieve outstanding performance on various tasks.
Their high computation demand and massive number of parameters make it difficult to deploy these SOTA CNNs onto resource-constrained devices.
We propose PENNI, a CNN model compression framework that is able to achieve model compactness and hardware efficiency simultaneously.
arXiv Detail & Related papers (2020-05-14T16:57:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.