DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware
Efficiency of Compact Neural Networks
- URL: http://arxiv.org/abs/2206.00843v1
- Date: Thu, 2 Jun 2022 02:32:47 GMT
- Title: DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware
Efficiency of Compact Neural Networks
- Authors: Yonggan Fu and Haichuan Yang and Jiayi Yuan and Meng Li and Cheng Wan
and Raghuraman Krishnamoorthi and Vikas Chandra and Yingyan Lin
- Abstract summary: We propose a framework dubbed DepthShrinker to develop hardware-friendly compact networks.
Our framework delivers hardware-friendly compact networks that outperform both state-of-the-art efficient DNNs and compression techniques.
- Score: 29.46621102184345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Efficient deep neural network (DNN) models equipped with compact operators
(e.g., depthwise convolutions) have shown great potential in reducing DNNs'
theoretical complexity (e.g., the total number of weights/operations) while
maintaining a decent model accuracy. However, existing efficient DNNs are still
limited in fulfilling their promise in boosting real-hardware efficiency, due
to their commonly adopted compact operators' low hardware utilization. In this
work, we open up a new compression paradigm for developing real-hardware
efficient DNNs, leading to boosted hardware efficiency while maintaining model
accuracy. Interestingly, we observe that while some DNN layers' activation
functions help DNNs' training optimization and achievable accuracy, they can be
properly removed after training without compromising the model accuracy.
Inspired by this observation, we propose a framework dubbed DepthShrinker,
which develops hardware-friendly compact networks via shrinking the basic
building blocks of existing efficient DNNs that feature irregular computation
patterns into dense ones with much improved hardware utilization and thus
real-hardware efficiency. Excitingly, our DepthShrinker framework delivers
hardware-friendly compact networks that outperform both state-of-the-art
efficient DNNs and compression techniques, e.g., a 3.06\% higher accuracy and
1.53$\times$ throughput on Tesla V100 over SOTA channel-wise pruning method
MetaPruning. Our codes are available at:
https://github.com/RICE-EIC/DepthShrinker.
Related papers
- Resource Constrained Model Compression via Minimax Optimization for
Spiking Neural Networks [11.19282454437627]
Spiking Neural Networks (SNNs) have the characteristics of event-driven and high energy-efficient networks.
It is difficult to deploy these networks on resource-limited edge devices directly.
We propose an improved end-to-end Minimax optimization method for this sparse learning problem.
arXiv Detail & Related papers (2023-08-09T02:50:15Z) - Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - Sub-bit Neural Networks: Learning to Compress and Accelerate Binary
Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs.
SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space.
Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z) - Towards Low-Latency Energy-Efficient Deep SNNs via Attention-Guided
Compression [12.37129078618206]
Deep spiking neural networks (SNNs) have emerged as a potential alternative to traditional deep learning frameworks.
Most SNN training frameworks yield large inference latency which translates to increased spike activity and reduced energy efficiency.
This paper presents a non-iterative SNN training technique thatachieves ultra-high compression with reduced spiking activity.
arXiv Detail & Related papers (2021-07-16T18:23:36Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - ShiftAddNet: A Hardware-Inspired Deep Network [87.18216601210763]
ShiftAddNet is an energy-efficient multiplication-less deep neural network.
It leads to both energy-efficient inference and training, without compromising expressive capacity.
ShiftAddNet aggressively reduces over 80% hardware-quantified energy cost of DNNs training and inference, while offering comparable or better accuracies.
arXiv Detail & Related papers (2020-10-24T05:09:14Z) - FTBNN: Rethinking Non-linearity for 1-bit CNNs and Going Beyond [23.5996182207431]
We show that binarized convolution process owns an increasing linearity towards the target of minimizing such error, which in turn hampers BNN's discriminative ability.
We re-investigate and tune proper non-linear modules to fix that contradiction, leading to a strong baseline which achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-10-19T08:11:48Z) - An Integrated Approach to Produce Robust Models with High Efficiency [9.476463361600828]
Quantization and structure simplification are promising ways to adapt Deep Neural Networks (DNNs) to mobile devices.
In this work, we try to obtain both features by applying a convergent relaxation quantization algorithm, Binary-Relax (BR), to a robust adversarial-trained model, ResNets Ensemble.
We design a trade-off loss function that helps DNNs preserve their natural accuracy and improve the channel sparsity.
arXiv Detail & Related papers (2020-08-31T00:44:59Z) - PERMDNN: Efficient Compressed DNN Architecture with Permuted Diagonal
Matrices [35.90103072918056]
Deep neural network (DNN) has emerged as the most important and popular artificial intelligent (AI) technique.
The growth of model size poses a key energy efficiency challenge for the underlying computing platform.
This paper proposes PermDNN, a novel approach to generate and execute hardware-friendly structured sparse DNN models.
arXiv Detail & Related papers (2020-04-23T02:26:40Z) - An Image Enhancing Pattern-based Sparsity for Real-time Inference on
Mobile Devices [58.62801151916888]
We introduce a new sparsity dimension, namely pattern-based sparsity that comprises pattern and connectivity sparsity, and becoming both highly accurate and hardware friendly.
Our approach on the new pattern-based sparsity naturally fits into compiler optimization for highly efficient DNN execution on mobile platforms.
arXiv Detail & Related papers (2020-01-20T16:17:36Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.