Cluster Pruning: An Efficient Filter Pruning Method for Edge AI Vision
Applications
- URL: http://arxiv.org/abs/2003.02449v1
- Date: Thu, 5 Mar 2020 06:20:09 GMT
- Title: Cluster Pruning: An Efficient Filter Pruning Method for Edge AI Vision
Applications
- Authors: Chinthaka Gamanayake, Lahiru Jayasinghe, Benny Ng, Chau Yuen
- Abstract summary: A novel greedy approach called cluster pruning has been proposed, which provides a structured way of removing filters in a CNN.
A low cost IoT hardware setup consisting of an Intel Movidius-NCS is proposed to deploy an edge-AI application using our proposed pruning methodology.
- Score: 13.197955183748796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Even though the Convolutional Neural Networks (CNN) has shown superior
results in the field of computer vision, it is still a challenging task to
implement computer vision algorithms in real-time at the edge, especially using
a low-cost IoT device due to high memory consumption and computation
complexities in a CNN. Network compression methodologies such as weight
pruning, filter pruning, and quantization are used to overcome the above
mentioned problem. Even though filter pruning methodology has shown better
performances compared to other techniques, irregularity of the number of
filters pruned across different layers of a CNN might not comply with majority
of the neural computing hardware architectures. In this paper, a novel greedy
approach called cluster pruning has been proposed, which provides a structured
way of removing filters in a CNN by considering the importance of filters and
the underlying hardware architecture. The proposed methodology is compared with
the conventional filter pruning algorithm on Pascal-VOC open dataset, and
Head-Counting dataset, which is our own dataset developed to detect and count
people entering a room. We benchmark our proposed method on three hardware
architectures, namely CPU, GPU, and Intel Movidius Neural Computer Stick (NCS)
using the popular SSD-MobileNet and SSD-SqueezeNet neural network architectures
used for edge-AI vision applications. Results demonstrate that our method
outperforms the conventional filter pruning methodology, using both datasets on
above mentioned hardware architectures. Furthermore, a low cost IoT hardware
setup consisting of an Intel Movidius-NCS is proposed to deploy an edge-AI
application using our proposed pruning methodology.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - Complexity-Driven CNN Compression for Resource-constrained Edge AI [1.6114012813668934]
We propose a novel and computationally efficient pruning pipeline by exploiting the inherent layer-level complexities of CNNs.
We define three modes of pruning, namely parameter-aware (PA), FLOPs-aware (FA), and memory-aware (MA), to introduce versatile compression of CNNs.
arXiv Detail & Related papers (2022-08-26T16:01:23Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - EffCNet: An Efficient CondenseNet for Image Classification on NXP
BlueBox [0.0]
Edge devices offer limited processing power due to their inexpensive hardware, and limited cooling and computational resources.
We propose a novel deep convolutional neural network architecture called EffCNet for edge devices.
arXiv Detail & Related papers (2021-11-28T21:32:31Z) - Hardware Architecture of Embedded Inference Accelerator and Analysis of
Algorithms for Depthwise and Large-Kernel Convolutions [27.141754658998323]
The proposed architecture can support filter kernels with different sizes with high flexibility.
For image classification, the accuracy is increased by 1% by simply replacing $3 times 3$ filters with $5 times 5$ filters in depthwise convolutions.
arXiv Detail & Related papers (2021-04-29T05:45:16Z) - Multi-Task Network Pruning and Embedded Optimization for Real-time
Deployment in ADAS [0.0]
Camera-based Deep Learning algorithms are increasingly needed for perception in Automated Driving systems.
constraints from the automotive industry challenge the deployment of CNNs by imposing embedded systems with limited computational resources.
We propose an approach to embed a multi-task CNN network under such conditions on a commercial prototype platform.
arXiv Detail & Related papers (2021-01-19T19:29:38Z) - Computational optimization of convolutional neural networks using
separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing.
Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z) - Depthwise Non-local Module for Fast Salient Object Detection Using a
Single Thread [136.2224792151324]
We propose a new deep learning algorithm for fast salient object detection.
The proposed algorithm achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.
arXiv Detail & Related papers (2020-01-22T15:23:48Z) - Lightweight Residual Densely Connected Convolutional Neural Network [18.310331378001397]
The lightweight residual densely connected blocks are proposed to guaranty the deep supervision, efficient gradient flow, and feature reuse abilities of convolutional neural network.
The proposed method decreases the cost of training and inference processes without using any special hardware-software equipment.
arXiv Detail & Related papers (2020-01-02T17:15:32Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.