Reduce Computational Complexity for Convolutional Layers by Skipping
Zeros
- URL: http://arxiv.org/abs/2306.15951v3
- Date: Sun, 5 Nov 2023 12:51:53 GMT
- Title: Reduce Computational Complexity for Convolutional Layers by Skipping
Zeros
- Authors: Zhiyi Zhang, Pengfei Zhang, Zhuopin Xu, Qi Wang
- Abstract summary: We propose an efficient algorithm for convolutional neural networks.
The C-K-S algorithm is accompanied by efficient GPU implementations.
Experiments show that C-K-S offers good performance in terms of speed and convergence.
- Score: 10.742743533768843
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Convolutional neural networks necessitate good algorithms to reduce
complexity, and sufficient utilization of parallel processors for acceleration.
Within convolutional layers, there are three types of operators: convolution
used in forward propagation, deconvolution and dilated-convolution utilized in
backward propagation. During the execution of these operators, zeros are
typically added to tensors, leading to redundant calculations and unnecessary
strain on hardware. To circumvent these inefficiencies, we propose the C-K-S
algorithm, accompanied by efficient GPU implementations. C-K-S trims filters to
exclude zero-padding. For deconvolution and dilated-convolution, C-K-S
transforms sparse tensors into dense tensors, and standardizes the local
computational rules to simplify the hardware control. The experimental results
demonstrate that C-K-S offers good performance in terms of speed and
convergence, surpassing the capabilities of PyTorch and cuDNN in certain
scenarios.
Related papers
- SFC: Achieve Accurate Fast Convolution under Low-precision Arithmetic [20.150429327542128]
Fast convolution algorithms, including Winograd and FFT, can efficiently accelerate convolution operations in deep models.
These algorithms depend on high-precision arithmetic to maintain inference accuracy, which conflicts with the model quantization.
We propose SFC, a new algebra transform for fast convolution by extending the Discrete Fourier Transform with symbolic computing.
We show that our new algorithms can further improve the efficiency of quantized models while maintaining accuracy, surpassing both the quantization-alone method and existing works on fast convolution quantization.
arXiv Detail & Related papers (2024-07-03T08:38:14Z) - On Model Compression for Neural Networks: Framework, Algorithm, and
Convergence Guarantee [10.783153208561469]
This paper focuses on two model compression techniques: low-rank approximation and weight approximation.
In this paper, a holistic framework is proposed for model compression from a novel perspective of non optimization.
arXiv Detail & Related papers (2023-03-13T02:14:42Z) - Leveraging SO(3)-steerable convolutions for pose-robust semantic segmentation in 3D medical data [2.207533492015563]
We present a new family of segmentation networks that use equivariant voxel convolutions based on spherical harmonics.
These networks are robust to data poses not seen during training, and do not require rotation-based data augmentation during training.
We demonstrate improved segmentation performance in MRI brain tumor and healthy brain structure segmentation tasks.
arXiv Detail & Related papers (2023-03-01T09:27:08Z) - Convolutional unitary or orthogonal recurrent neural networks [0.0]
We show that in the specific case of convolutional RNNs, we can define a convolutional exponential.
We explicitly derive FFT-based algorithms to compute the kernels and their derivatives.
arXiv Detail & Related papers (2023-02-14T23:36:21Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Low-complexity Approximate Convolutional Neural Networks [1.7368964547487395]
We present an approach for minimizing the computational complexity of trained Convolutional Neural Networks (ConvNet)
The idea is to approximate all elements of a given ConvNet with efficient approximations capable of extreme reductions in computational complexity.
Such low-complexity structures pave the way for low-power, efficient hardware designs.
arXiv Detail & Related papers (2022-07-29T21:59:29Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Content-Aware Convolutional Neural Networks [98.97634685964819]
Convolutional Neural Networks (CNNs) have achieved great success due to the powerful feature learning ability of convolution layers.
We propose a Content-aware Convolution (CAC) that automatically detects the smooth windows and applies a 1x1 convolutional kernel to replace the original large kernel.
arXiv Detail & Related papers (2021-06-30T03:54:35Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z) - DO-Conv: Depthwise Over-parameterized Convolutional Layer [66.46704754669169]
We propose to augment a convolutional layer with an additional depthwise convolution, where each input channel is convolved with a different 2D kernel.
We show with extensive experiments that the mere replacement of conventional convolutional layers with DO-Conv layers boosts the performance of CNNs.
arXiv Detail & Related papers (2020-06-22T06:57:10Z) - XSepConv: Extremely Separated Convolution [60.90871656244126]
We propose a novel extremely separated convolutional block (XSepConv)
It fuses spatially separable convolutions into depthwise convolution to reduce both the computational cost and parameter size of large kernels.
XSepConv is designed to be an efficient alternative to vanilla depthwise convolution with large kernel sizes.
arXiv Detail & Related papers (2020-02-27T11:46:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.