Joint Device-Edge Inference over Wireless Links with Pruning
- URL: http://arxiv.org/abs/2003.02027v2
- Date: Tue, 20 Oct 2020 10:32:01 GMT
- Title: Joint Device-Edge Inference over Wireless Links with Pruning
- Authors: Mikolaj Jankowski, Deniz Gunduz, Krystian Mikolajczyk
- Abstract summary: We propose a joint feature compression and transmission scheme for efficient inference at the wireless network edge.
This is the first work that combines DeepJSCC with network pruning, and applies it to image classification over the wireless edge.
- Score: 20.45405359815043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a joint feature compression and transmission scheme for efficient
inference at the wireless network edge. Our goal is to enable efficient and
reliable inference at the edge server assuming limited computational resources
at the edge device. Previous work focused mainly on feature compression,
ignoring the computational cost of channel coding. We incorporate the recently
proposed deep joint source-channel coding (DeepJSCC) scheme, and combine it
with novel filter pruning strategies aimed at reducing the redundant complexity
from neural networks. We evaluate our approach on a classification task, and
show improved results in both end-to-end reliability and workload reduction at
the edge device. This is the first work that combines DeepJSCC with network
pruning, and applies it to image classification over the wireless edge.
Related papers
- SpikeBottleNet: Spike-Driven Feature Compression Architecture for Edge-Cloud Co-Inference [0.86325068644655]
We propose SpikeBottleNet, a novel architecture for edge-cloud co-inference systems.
SpikeBottleNet integrates a spiking neuron model to significantly reduce energy consumption on edge devices.
arXiv Detail & Related papers (2024-10-11T09:59:21Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Neural Network Compression by Joint Sparsity Promotion and Redundancy
Reduction [4.9613162734482215]
This paper presents a novel training scheme based on composite constraints that prune redundant filters and minimize their effect on overall network learning via sparsity promotion.
Our tests on several pixel-wise segmentation benchmarks show that the number of neurons and the memory footprint of networks in the test phase are significantly reduced without affecting performance.
arXiv Detail & Related papers (2022-10-14T01:34:49Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Image Superresolution using Scale-Recurrent Dense Network [30.75380029218373]
Recent advances in the design of convolutional neural network (CNN) have yielded significant improvements in the performance of image super-resolution (SR)
We propose a scale recurrent SR architecture built upon units containing series of dense connections within a residual block (Residual Dense Blocks (RDBs))
Our scale recurrent design delivers competitive performance for higher scale factors while being parametrically more efficient as compared to current state-of-the-art approaches.
arXiv Detail & Related papers (2022-01-28T09:18:43Z) - Group Fisher Pruning for Practical Network Compression [58.25776612812883]
We present a general channel pruning approach that can be applied to various complicated structures.
We derive a unified metric based on Fisher information to evaluate the importance of a single channel and coupled channels.
Our method can be used to prune any structures including those with coupled channels.
arXiv Detail & Related papers (2021-08-02T08:21:44Z) - Joint Coding and Scheduling Optimization for Distributed Learning over
Wireless Edge Networks [21.422040036286536]
This article addresses problems by leveraging recent advances in coded computing and the deep dueling neural network architecture.
By introducing coded structures/redundancy, a distributed learning task can be completed without waiting for straggling nodes.
Simulations show that the proposed framework reduces the average learning delay in wireless edge computing up to 66% compared with other DL approaches.
arXiv Detail & Related papers (2021-03-07T08:57:09Z) - Neural Compression and Filtering for Edge-assisted Real-time Object
Detection in Challenged Networks [8.291242737118482]
We focus on edge computing supporting remote object detection by means of Deep Neural Networks (DNNs)
We develop a framework to reduce the amount of data transmitted over the wireless link.
The proposed technique represents an effective intermediate option between local and edge computing in a parameter region.
arXiv Detail & Related papers (2020-07-31T03:11:46Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z) - A "Network Pruning Network" Approach to Deep Model Compression [62.68120664998911]
We present a filter pruning approach for deep model compression using a multitask network.
Our approach is based on learning a a pruner network to prune a pre-trained target network.
The compressed model produced by our approach is generic and does not need any special hardware/software support.
arXiv Detail & Related papers (2020-01-15T20:38:23Z) - Discrimination-aware Network Pruning for Deep Model Compression [79.44318503847136]
Existing pruning methods either train from scratch with sparsity constraints or minimize the reconstruction error between the feature maps of the pre-trained models and the compressed ones.
We propose a simple-yet-effective method called discrimination-aware channel pruning (DCP) to choose the channels that actually contribute to the discriminative power.
Experiments on both image classification and face recognition demonstrate the effectiveness of our methods.
arXiv Detail & Related papers (2020-01-04T07:07:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.