A Simple Sparse Matrix Vector Multiplication Approach to Padded Convolution
- URL: http://arxiv.org/abs/2411.19419v1
- Date: Fri, 29 Nov 2024 00:14:24 GMT
- Title: A Simple Sparse Matrix Vector Multiplication Approach to Padded Convolution
- Authors: Zan Chaudhry,
- Abstract summary: We introduce an algorithm for efficiently representing convolution with zero-padding and stride as a sparse transformation matrix.
We provide an explicit expression for the number of non-zero multiplications in convolutions with stride and padding, offering insight into the potential for leveraging sparsity in convolution operations.
- Score: 0.0
- License:
- Abstract: We introduce an algorithm for efficiently representing convolution with zero-padding and stride as a sparse transformation matrix, applied to a vectorized input through sparse matrix-vector multiplication (SpMV). We provide a theoretical contribution with an explicit expression for the number of non-zero multiplications in convolutions with stride and padding, offering insight into the potential for leveraging sparsity in convolution operations. A proof-of-concept implementation is presented in Python, demonstrating the performance of our method on both CPU and GPU architectures. This work contributes to the broader exploration of sparse matrix techniques in convolutional algorithms, with a particular focus on leveraging matrix multiplications for parallelization. Our findings lay the groundwork for future advancements in exploiting sparsity to improve the efficiency of convolution operations in fields such as machine learning and signal processing.
Related papers
- Improving Algorithmic Efficiency using Cryptography [11.496343300483904]
We show how to use cryptography to improve the time complexity of solving computational problems.
We show that under standard cryptographic assumptions, we can design algorithms that are faster than existing ones.
arXiv Detail & Related papers (2025-02-18T17:08:59Z) - SMM-Conv: Scalar Matrix Multiplication with Zero Packing for Accelerated Convolution [4.14360329494344]
We present a novel approach for accelerating convolutions during inference for CPU-based architectures.
Our experiments with commonly used network architectures demonstrate a significant speedup compared to existing indirect methods.
arXiv Detail & Related papers (2024-11-23T21:43:38Z) - MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers [43.39466934693055]
We present MemoryFormer, a novel transformer architecture which significantly reduces the computational complexity (FLOPs) from a new perspective.
This is made possible by utilizing an alternative method for feature transformation to replace the linear projection of fully-connected layers.
We conduct extensive experiments on various benchmarks to demonstrate the effectiveness of the proposed model.
arXiv Detail & Related papers (2024-11-20T02:41:53Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Optimized Sparse Matrix Operations for Reverse Mode Automatic
Differentiation [3.72826300260966]
We present an implementation of a CSR-based sparse matrix wrapper for PyTorch with acceleration for basic matrix operations, as well as automatic differentiability.
We also present several applications of the resulting sparse kernels to optimization problems, demonstrating ease of implementation and performance measurements versus their dense counterparts.
arXiv Detail & Related papers (2022-12-10T00:46:51Z) - Fast Differentiable Matrix Square Root and Inverse Square Root [65.67315418971688]
We propose two more efficient variants to compute the differentiable matrix square root and the inverse square root.
For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Pad'e Approximants (MPA)
A series of numerical tests show that both methods yield considerable speed-up compared with the SVD or the NS iteration.
arXiv Detail & Related papers (2022-01-29T10:00:35Z) - Fast Differentiable Matrix Square Root [65.67315418971688]
We propose two more efficient variants to compute the differentiable matrix square root.
For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP)
The other method is to use Matrix Pad'e Approximants (MPA)
arXiv Detail & Related papers (2022-01-21T12:18:06Z) - Efficient GPU implementation of randomized SVD and its applications [17.71779625877989]
Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality data compression and deep learning algorithms.
Typical solutions for matrix decompositions have complexity which significantly increases their computational cost and time.
We leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs) to reduce the computational burden of computing matrix decompositions.
arXiv Detail & Related papers (2021-10-05T07:42:41Z) - Robust 1-bit Compressive Sensing with Partial Gaussian Circulant
Matrices and Generative Priors [54.936314353063494]
We provide recovery guarantees for a correlation-based optimization algorithm for robust 1-bit compressive sensing.
We make use of a practical iterative algorithm, and perform numerical experiments on image datasets to corroborate our results.
arXiv Detail & Related papers (2021-08-08T05:28:06Z) - Multi-View Spectral Clustering with High-Order Optimal Neighborhood
Laplacian Matrix [57.11971786407279]
Multi-view spectral clustering can effectively reveal the intrinsic cluster structure among data.
This paper proposes a multi-view spectral clustering algorithm that learns a high-order optimal neighborhood Laplacian matrix.
Our proposed algorithm generates the optimal Laplacian matrix by searching the neighborhood of the linear combination of both the first-order and high-order base.
arXiv Detail & Related papers (2020-08-31T12:28:40Z) - Kernel methods through the roof: handling billions of points efficiently [94.31450736250918]
Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems.
Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections.
Here, we push these efforts further to develop and test a solver that takes full advantage of GPU hardware.
arXiv Detail & Related papers (2020-06-18T08:16:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.