Related papers: Fast Differentiable Matrix Square Root and Inverse Square Root

Fast Differentiable Matrix Square Root and Inverse Square Root

URL: http://arxiv.org/abs/2201.12543v1
Date: Sat, 29 Jan 2022 10:00:35 GMT
Title: Fast Differentiable Matrix Square Root and Inverse Square Root
Authors: Yue Song, Nicu Sebe, Wei Wang
Abstract summary: We propose two more efficient variants to compute the differentiable matrix square root and the inverse square root. For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Pad'e Approximants (MPA) A series of numerical tests show that both methods yield considerable speed-up compared with the SVD or the NS iteration.
Score: 65.67315418971688
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Computing the matrix square root and its inverse in a differentiable manner is important in a variety of computer vision tasks. Previous methods either adopt the Singular Value Decomposition (SVD) to explicitly factorize the matrix or use the Newton-Schulz iteration (NS iteration) to derive the approximate solution. However, both methods are not computationally efficient enough in either the forward pass or the backward pass. In this paper, we propose two more efficient variants to compute the differentiable matrix square root and the inverse square root. For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Pad\'e Approximants (MPA). The backward gradient is computed by iteratively solving the continuous-time Lyapunov equation using the matrix sign function. A series of numerical tests show that both methods yield considerable speed-up compared with the SVD or the NS iteration. Moreover, we validate the effectiveness of our methods in several real-world applications, including de-correlated batch normalization, second-order vision transformer, global covariance pooling for large-scale and fine-grained recognition, attentive covariance pooling for video recognition, and neural style transfer. The experimental results demonstrate that our methods can also achieve competitive and even slightly better performances. The Pytorch implementation is available at \href{https://github.com/KingJamesSong/FastDifferentiableMatSqrt}{https://github.com/KingJamesSong/FastDifferentiableMatSqrt}.

Related papers

Improving Algorithmic Efficiency using Cryptography [11.496343300483904]
We show how to use cryptography to improve the time complexity of solving computational problems. We show that under standard cryptographic assumptions, we can design algorithms that are determinantally faster than existing ones.
arXiv Detail & Related papers (2025-02-18T17:08:59Z)
Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties. We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z)
Asymmetric Scalable Cross-modal Hashing [51.309905690367835]
Cross-modal hashing is a successful method to solve large-scale multimedia retrieval issue. We propose a novel Asymmetric Scalable Cross-Modal Hashing (ASCMH) to address these issues. Our ASCMH outperforms the state-of-the-art cross-modal hashing methods in terms of accuracy and efficiency.
arXiv Detail & Related papers (2022-07-26T04:38:47Z)
Softmax-free Linear Transformers [90.83157268265654]
Vision transformers (ViTs) have pushed the state-of-the-art for visual perception tasks. Existing methods are either theoretically flawed or empirically ineffective for visual recognition. We propose a family of Softmax-Free Transformers (SOFT)
arXiv Detail & Related papers (2022-07-05T03:08:27Z)
Fast Differentiable Matrix Square Root [65.67315418971688]
We propose two more efficient variants to compute the differentiable matrix square root. For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP) The other method is to use Matrix Pad'e Approximants (MPA)
arXiv Detail & Related papers (2022-01-21T12:18:06Z)
Sublinear Time Approximation of Text Similarity Matrices [50.73398637380375]
We introduce a generalization of the popular Nystr"om method to the indefinite setting. Our algorithm can be applied to any similarity matrix and runs in sublinear time in the size of the matrix. We show that our method, along with a simple variant of CUR decomposition, performs very well in approximating a variety of similarity matrices.
arXiv Detail & Related papers (2021-12-17T17:04:34Z)
Sparse Factorization of Large Square Matrices [10.94053598642913]
In this paper, we propose to approximate a large square matrix with a product of sparse full-rank matrices. In the approximation, our method needs only $N(log N)2$ non-zero numbers for an $Ntimes N$ full matrix. We show that our method gives a better approximation when the approximated matrix is sparse and high-rank.
arXiv Detail & Related papers (2021-09-16T18:42:21Z)
Fast and Accurate Pseudoinverse with Sparse Matrix Reordering and Incremental Approach [4.710916891482697]
A pseudoinverse is a generalization of a matrix inverse, which has been extensively utilized in machine learning. FastPI is a novel incremental singular value decomposition (SVD) based pseudoinverse method for sparse matrices. We show that FastPI computes the pseudoinverse faster than other approximate methods without loss of accuracy.
arXiv Detail & Related papers (2020-11-09T07:47:10Z)
Fast Coherent Point Drift [4.369046007546103]
Coherent point drift (CPD) is a classical method for nonrigid point set registration. By introducing a simple corresponding constraint, we develop a fast implementation of CPD. Experimental results in 3D point cloud data show that our method can significantly reduce burden of the registration process.
arXiv Detail & Related papers (2020-06-11T09:35:23Z)
A Block Coordinate Descent-based Projected Gradient Algorithm for Orthogonal Non-negative Matrix Factorization [0.0]
This article utilizes the projected gradient method (PG) for a non-negative matrix factorization problem (NMF) We penalise the orthonormality constraints and apply the PG method via a block coordinate descent approach.
arXiv Detail & Related papers (2020-03-23T13:24:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.