Related papers: Reduce Computational Complexity for Convolutional Layers by Skipping Zeros

Reduce Computational Complexity for Convolutional Layers by Skipping Zeros

URL: http://arxiv.org/abs/2306.15951v3
Date: Sun, 5 Nov 2023 12:51:53 GMT
Title: Reduce Computational Complexity for Convolutional Layers by Skipping Zeros
Authors: Zhiyi Zhang, Pengfei Zhang, Zhuopin Xu, Qi Wang
Abstract summary: We propose an efficient algorithm for convolutional neural networks. The C-K-S algorithm is accompanied by efficient GPU implementations. Experiments show that C-K-S offers good performance in terms of speed and convergence.
Score: 10.742743533768843
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Convolutional neural networks necessitate good algorithms to reduce complexity, and sufficient utilization of parallel processors for acceleration. Within convolutional layers, there are three types of operators: convolution used in forward propagation, deconvolution and dilated-convolution utilized in backward propagation. During the execution of these operators, zeros are typically added to tensors, leading to redundant calculations and unnecessary strain on hardware. To circumvent these inefficiencies, we propose the C-K-S algorithm, accompanied by efficient GPU implementations. C-K-S trims filters to exclude zero-padding. For deconvolution and dilated-convolution, C-K-S transforms sparse tensors into dense tensors, and standardizes the local computational rules to simplify the hardware control. The experimental results demonstrate that C-K-S offers good performance in terms of speed and convergence, surpassing the capabilities of PyTorch and cuDNN in certain scenarios.

Related papers

Tensor Decomposition Networks for Fast Machine Learning Interatomic Potential Computations [63.945006006152035]
tensor decomposition networks (TDNs) achieve competitive performance with dramatic speedup in computations.<n>We evaluate TDNs on PubChemQCR, a newly curated molecular relaxation dataset containing 105 million DFT-calculated snapshots.
arXiv Detail & Related papers (2025-07-01T18:46:27Z)
TC-GS: A Faster Gaussian Splatting Module Utilizing Tensor Cores [9.744829716477627]
This paper proposes TC-GS, an algorithm-independent universal module that expands Core (TCU) applicability for 3DGS.<n>The key innovation lies in mapping alpha to matrix multiplication, fully utilizing otherwise idle TCUs in existing 3DGS implementations.
arXiv Detail & Related papers (2025-05-30T16:58:18Z)
Flexible Coded Distributed Convolution Computing for Enhanced Fault Tolerance and Numerical Stability in Distributed CNNs [26.347141131107172]
This paper introduces the Flexible Coded Distributed Convolution Computing framework. It enhances fault tolerance and numerical stability in distributed CNNs. Empirical results validate the framework's effectiveness in computational efficiency, fault tolerance, and scalability.
arXiv Detail & Related papers (2024-11-03T14:05:29Z)
Kolmogorov-Arnold Transformer [72.88137795439407]
We introduce the Kolmogorov-Arnold Transformer (KAT), a novel architecture that replaces layers with Kolmogorov-Arnold Network (KAN) layers. We identify three key challenges: (C1) Base function, (C2) Inefficiency, and (C3) Weight. With these designs, KAT outperforms traditional-based transformers.
arXiv Detail & Related papers (2024-09-16T17:54:51Z)
Convolutions and More as Einsum: A Tensor Network Perspective with Advances for Second-Order Methods [2.8645507575980074]
We simplify convolutions by viewing them as tensor networks (TNs) TNs allow reasoning about the underlying tensor multiplications by drawing diagrams, manipulating them to perform function transformations like differentiation, and efficiently evaluating them with einsum. Our TN implementation accelerates KFAC variant up to 4.5x while removing the standard implementation's memory overhead, and enables new hardware-efficient dropouts for approximate backpropagation.
arXiv Detail & Related papers (2023-07-05T13:19:41Z)
On Model Compression for Neural Networks: Framework, Algorithm, and Convergence Guarantee [21.818773423324235]
This paper focuses on two model compression techniques: low-rank approximation and weight approximation. In this paper, a holistic framework is proposed for model compression from a novel perspective of non optimization.
arXiv Detail & Related papers (2023-03-13T02:14:42Z)
Leveraging SO(3)-steerable convolutions for pose-robust semantic segmentation in 3D medical data [2.207533492015563]
We present a new family of segmentation networks that use equivariant voxel convolutions based on spherical harmonics. These networks are robust to data poses not seen during training, and do not require rotation-based data augmentation during training. We demonstrate improved segmentation performance in MRI brain tumor and healthy brain structure segmentation tasks.
arXiv Detail & Related papers (2023-03-01T09:27:08Z)
Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel. Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU. Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z)
Low-complexity Approximate Convolutional Neural Networks [1.7368964547487395]
We present an approach for minimizing the computational complexity of trained Convolutional Neural Networks (ConvNet) The idea is to approximate all elements of a given ConvNet with efficient approximations capable of extreme reductions in computational complexity. Such low-complexity structures pave the way for low-power, efficient hardware designs.
arXiv Detail & Related papers (2022-07-29T21:59:29Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Content-Aware Convolutional Neural Networks [98.97634685964819]
Convolutional Neural Networks (CNNs) have achieved great success due to the powerful feature learning ability of convolution layers. We propose a Content-aware Convolution (CAC) that automatically detects the smooth windows and applies a 1x1 convolutional kernel to replace the original large kernel.
arXiv Detail & Related papers (2021-06-30T03:54:35Z)
DO-Conv: Depthwise Over-parameterized Convolutional Layer [66.46704754669169]
We propose to augment a convolutional layer with an additional depthwise convolution, where each input channel is convolved with a different 2D kernel. We show with extensive experiments that the mere replacement of conventional convolutional layers with DO-Conv layers boosts the performance of CNNs.
arXiv Detail & Related papers (2020-06-22T06:57:10Z)
XSepConv: Extremely Separated Convolution [60.90871656244126]
We propose a novel extremely separated convolutional block (XSepConv) It fuses spatially separable convolutions into depthwise convolution to reduce both the computational cost and parameter size of large kernels. XSepConv is designed to be an efficient alternative to vanilla depthwise convolution with large kernel sizes.
arXiv Detail & Related papers (2020-02-27T11:46:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.