SpikeBottleNet: Spike-Driven Feature Compression Architecture for Edge-Cloud Co-Inference
- URL: http://arxiv.org/abs/2410.08673v2
- Date: Thu, 7 Nov 2024 14:49:28 GMT
- Title: SpikeBottleNet: Spike-Driven Feature Compression Architecture for Edge-Cloud Co-Inference
- Authors: Maruf Hassan, Steven Davy,
- Abstract summary: We propose SpikeBottleNet, a novel architecture for edge-cloud co-inference systems.
SpikeBottleNet integrates a spiking neuron model to significantly reduce energy consumption on edge devices.
- Score: 0.86325068644655
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Edge-cloud co-inference enables efficient deep neural network (DNN) deployment by splitting the architecture between an edge device and cloud server, crucial for resource-constraint edge devices. This approach requires balancing on-device computations and communication costs, often achieved through compressed intermediate feature transmission. Conventional DNN architectures require continuous data processing and floating point activations, leading to considerable energy consumption and increased feature sizes, thus raising transmission costs. This challenge motivates exploring binary, event-driven activations using spiking neural networks (SNNs), known for their extreme energy efficiency. In this research, we propose SpikeBottleNet, a novel architecture for edge-cloud co-inference systems that integrates a spiking neuron model to significantly reduce energy consumption on edge devices. A key innovation of our study is an intermediate feature compression technique tailored for SNNs for efficient feature transmission. This technique leverages a split computing approach to strategically place encoder-decoder bottleneck units within complex deep architectures like ResNet and MobileNet. Experimental results demonstrate that SpikeBottleNet achieves up to 256x bit compression in the final convolutional layer of ResNet, with minimal accuracy loss (0.16%). Additionally, our approach enhances edge device energy efficiency by up to 144x compared to the baseline BottleNet, making it ideal for resource-limited edge devices.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Growing Efficient Accurate and Robust Neural Networks on the Edge [0.9208007322096533]
Current solutions rely on the Cloud to train and compress models before deploying to the Edge.
This incurs high energy and latency costs in transmitting locally acquired field data to the Cloud while also raising privacy concerns.
We propose GEARnn to grow and train robust networks entirely on the Edge device.
arXiv Detail & Related papers (2024-10-10T08:01:42Z) - DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative
Inference [12.095934624748686]
We propose DVFO, a novel DVFS-enabled edge-cloud collaborative inference framework.
It automatically co-optimizes the CPU, GPU and memory frequencies of edge devices, and the feature maps to be offloaded to cloud servers.
It significantly reduces the energy consumption by 33% on average, compared to state-of-the-art schemes.
arXiv Detail & Related papers (2023-06-02T07:00:42Z) - Attention-based Feature Compression for CNN Inference Offloading in Edge
Computing [93.67044879636093]
This paper studies the computational offloading of CNN inference in device-edge co-inference systems.
We propose a novel autoencoder-based CNN architecture (AECNN) for effective feature extraction at end-device.
Experiments show that AECNN can compress the intermediate data by more than 256x with only about 4% accuracy loss.
arXiv Detail & Related papers (2022-11-24T18:10:01Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Dynamic Split Computing for Efficient Deep Edge Intelligence [78.4233915447056]
We introduce dynamic split computing, where the optimal split location is dynamically selected based on the state of the communication channel.
We show that dynamic split computing achieves faster inference in edge computing environments where the data rate and server load vary over time.
arXiv Detail & Related papers (2022-05-23T12:35:18Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - EffCNet: An Efficient CondenseNet for Image Classification on NXP
BlueBox [0.0]
Edge devices offer limited processing power due to their inexpensive hardware, and limited cooling and computational resources.
We propose a novel deep convolutional neural network architecture called EffCNet for edge devices.
arXiv Detail & Related papers (2021-11-28T21:32:31Z) - Latency-Memory Optimized Splitting of Convolution Neural Networks for
Resource Constrained Edge Devices [1.6873748786804317]
We argue that running CNNs between an edge device and the cloud is synonymous to solving a resource-constrained optimization problem.
Experiments done on real-world edge devices show that, LMOS ensures feasible execution of different CNN models at the edge.
arXiv Detail & Related papers (2021-07-19T19:39:56Z) - Energy-Efficient Model Compression and Splitting for Collaborative
Inference Over Time-Varying Channels [52.60092598312894]
We propose a technique to reduce the total energy bill at the edge device by utilizing model compression and time-varying model split between the edge and remote nodes.
Our proposed solution results in minimal energy consumption and $CO$ emission compared to the considered baselines.
arXiv Detail & Related papers (2021-06-02T07:36:27Z) - CoEdge: Cooperative DNN Inference with Adaptive Workload Partitioning
over Heterogeneous Edge Devices [39.09319776243573]
CoEdge is a distributed Deep Neural Network (DNN) computing system that orchestrates cooperative inference over heterogeneous edge devices.
CoEdge saves energy with close inference latency, achieving up to 25.5%66.9% energy reduction for four widely-adopted CNN models.
arXiv Detail & Related papers (2020-12-06T13:15:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.