Related papers: FrankenSplit: Efficient Neural Feature Compression with Shallow Variational Bottleneck Injection for Mobile Edge Computing

FrankenSplit: Efficient Neural Feature Compression with Shallow Variational Bottleneck Injection for Mobile Edge Computing

URL: http://arxiv.org/abs/2302.10681v4
Date: Sat, 23 Mar 2024 15:51:29 GMT
Title: FrankenSplit: Efficient Neural Feature Compression with Shallow Variational Bottleneck Injection for Mobile Edge Computing
Authors: Alireza Furutanpey, Philipp Raith, Schahram Dustdar,
Abstract summary: We introduce a novel framework for resource-conscious compression models and extensively evaluate our method in an asymmetric environment. Our method achieves 60% lower than a state-of-the-art SC method without decreasing accuracy and is up 16x faster than offloading with existing standards.
Score: 5.815300670677979
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rise of mobile AI accelerators allows latency-sensitive applications to execute lightweight Deep Neural Networks (DNNs) on the client side. However, critical applications require powerful models that edge devices cannot host and must therefore offload requests, where the high-dimensional data will compete for limited bandwidth. This work proposes shifting away from focusing on executing shallow layers of partitioned DNNs. Instead, it advocates concentrating the local resources on variational compression optimized for machine interpretability. We introduce a novel framework for resource-conscious compression models and extensively evaluate our method in an environment reflecting the asymmetric resource distribution between edge devices and servers. Our method achieves 60% lower bitrate than a state-of-the-art SC method without decreasing accuracy and is up to 16x faster than offloading with existing codec standards.

Related papers

Range Asymmetric Numeral Systems-Based Lightweight Intermediate Feature Compression for Split Computing of Deep Neural Networks [5.186026342830856]
Split computing distributes deep neural network inference between resource-constrained edge devices and cloud servers.<n>We propose a novel lightweight compression framework that leverages Range Asymmetric Numeral Systems (rANS) encoding with asymmetric integer quantization and sparse tensor representation to reduce transmission overhead dramatically.
arXiv Detail & Related papers (2025-11-11T12:33:59Z)
Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding [56.066799081747845]
The ever-growing size of neural networks poses serious challenges on resource-constrained devices.<n>We propose a novel post-training compression framework that combines rate-aware quantization with entropy coding.<n>Our method allows for very fast decoding and is compatible with arbitrary quantization grids.
arXiv Detail & Related papers (2025-05-24T15:52:49Z)
Progressive Neural Compression for Adaptive Image Offloading under Timing Constraints [9.903309560890317]
It is important to develop an adaptive approach that maximizes the inference performance of machine learning applications under timing constraints. In this paper, we use image classification as our target application and propose progressive neural compression (PNC) as an efficient solution to this problem. We demonstrate the benefits of PNC over state-of-the-art neural compression approaches and traditional compression methods on a testbed.
arXiv Detail & Related papers (2023-10-08T22:58:31Z)
Slimmable Encoders for Flexible Split DNNs in Bandwidth and Resource Constrained IoT Systems [12.427821850039448]
We propose a novel split computing approach based on slimmable ensemble encoders. The key advantage of our design is the ability to adapt computational load and transmitted data size in real-time with minimal overhead and time. Our model outperforms existing solutions in terms of compression efficacy and execution time, especially in the context of weak mobile devices.
arXiv Detail & Related papers (2023-06-22T06:33:12Z)
Adaptive Federated Pruning in Hierarchical Wireless Networks [69.6417645730093]
Federated Learning (FL) is a privacy-preserving distributed learning framework where a server aggregates models updated by multiple devices without accessing their private datasets. In this paper, we introduce model pruning for HFL in wireless networks to reduce the neural network scale. We show that our proposed HFL with model pruning achieves similar learning accuracy compared with the HFL without model pruning and reduces about 50 percent communication cost.
arXiv Detail & Related papers (2023-05-15T22:04:49Z)
A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate Compression for Split DNN Computing [5.3221129103999125]
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads. We present an approach that addresses the challenge of optimizing the rate-accuracy-complexity trade-off. Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance.
arXiv Detail & Related papers (2022-08-24T15:02:11Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Lightweight compression of neural network feature tensors for collaborative intelligence [32.03465747357384]
In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a relatively low-complexity device such as a mobile phone or edge device. This paper presents a novel lightweight compression technique designed specifically to code the activations of a split DNN layer.
arXiv Detail & Related papers (2021-05-12T23:41:35Z)
Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network. We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z)
ALF: Autoencoder-based Low-rank Filter-sharing for Efficient Convolutional Neural Networks [63.91384986073851]
We propose the autoencoder-based low-rank filter-sharing technique technique (ALF) ALF shows a reduction of 70% in network parameters, 61% in operations and 41% in execution time, with minimal loss in accuracy.
arXiv Detail & Related papers (2020-07-27T09:01:22Z)
A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework [56.57225686288006]
Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices. Previous pruning methods mainly focus on reducing the model size and/or improving performance without considering the privacy of user data. We propose a privacy-preserving-oriented pruning and mobile acceleration framework that does not require the private training dataset.
arXiv Detail & Related papers (2020-03-13T23:52:03Z)
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.