Related papers: Robust Differentiable SVD

Robust Differentiable SVD

URL: http://arxiv.org/abs/2104.03821v1
Date: Thu, 8 Apr 2021 15:04:15 GMT
Title: Robust Differentiable SVD
Authors: Wei Wang, Zheng Dang, Yinlin Hu, Pascal Fua and Mathieu Salzmann
Abstract summary: Eigendecomposition of symmetric matrices is at the heart of many computer vision algorithms. Instability arises in the presence of eigenvalues that are close to each other. We show that the Taylor expansion of the SVD gradient is theoretically equivalent to the gradient obtained using PI without relying on an iterative process.
Score: 117.35644933471401
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Eigendecomposition of symmetric matrices is at the heart of many computer vision algorithms. However, the derivatives of the eigenvectors tend to be numerically unstable, whether using the SVD to compute them analytically or using the Power Iteration (PI) method to approximate them. This instability arises in the presence of eigenvalues that are close to each other. This makes integrating eigendecomposition into deep networks difficult and often results in poor convergence, particularly when dealing with large matrices. While this can be mitigated by partitioning the data into small arbitrary groups, doing so has no theoretical basis and makes it impossible to exploit the full power of eigendecomposition. In previous work, we mitigated this using SVD during the forward pass and PI to compute the gradients during the backward pass. However, the iterative deflation procedure required to compute multiple eigenvectors using PI tends to accumulate errors and yield inaccurate gradients. Here, we show that the Taylor expansion of the SVD gradient is theoretically equivalent to the gradient obtained using PI without relying in practice on an iterative process and thus yields more accurate gradients. We demonstrate the benefits of this increased accuracy for image classification and style transfer.

Related papers

Contraction rates for conjugate gradient and Lanczos approximate posteriors in Gaussian process regression [0.0]
We analyze a class of recently proposed approximation algorithms from the field of Probabilistic numerics. We combine result from the numerical analysis literature with state of the art concentration results for spectra of kernel matrices to obtain minimax contraction rates.
arXiv Detail & Related papers (2024-06-18T14:50:42Z)
Quantum state tomography via non-convex Riemannian gradient descent [13.100184125419691]
In this work, we derive a quantum state tomography scheme that improves the dependence on $kappa$ to the scale. The improvement comes from the application the non-varian gradient (RGD) algorithms. Our theoretical results of extremely fast convergence and nearly optimal error bounds are corroborated by numerical results.
arXiv Detail & Related papers (2022-10-10T14:19:23Z)
Nonparametric Factor Trajectory Learning for Dynamic Tensor Decomposition [20.55025648415664]
We propose NON FActor Trajectory learning for dynamic tensor decomposition (NONFAT) We use a second-level GP to sample the entry values and to capture the temporal relationship between the entities. We have shown the advantage of our method in several real-world applications.
arXiv Detail & Related papers (2022-07-06T05:33:00Z)
Softmax-free Linear Transformers [90.83157268265654]
Vision transformers (ViTs) have pushed the state-of-the-art for visual perception tasks. Existing methods are either theoretically flawed or empirically ineffective for visual recognition. We propose a family of Softmax-Free Transformers (SOFT)
arXiv Detail & Related papers (2022-07-05T03:08:27Z)
Alternating Mahalanobis Distance Minimization for Stable and Accurate CP Decomposition [4.847980206213335]
We introduce a new formulation for deriving singular values and vectors of a tensor by considering the critical points of a function different from what is used in the previous work. We show that a subsweep of this algorithm can achieve a superlinear convergence rate for exact CPD with known rank. We then view the algorithm as optimizing a Mahalanobis distance with respect to each factor with ground metric dependent on the other factors.
arXiv Detail & Related papers (2022-04-14T19:56:36Z)
ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure [26.24282086797512]
Curvature in form of the Hessian or its generalized Gauss-Newton (GGN) approximation is valuable for algorithms that rely on a local model for the loss to train, compress, or explain deep networks. We present ViViT, a curvature model that leverages the GGN's low-rank structure without further approximations.
arXiv Detail & Related papers (2021-06-04T17:37:47Z)
Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling? [59.820507600960745]
We propose a new GCP meta-layer that uses SVD in the forward pass, and Pad'e Approximants in the backward propagation to compute the gradients. The proposed meta-layer has been integrated into different CNN models and achieves state-of-the-art performances on both large-scale and fine-grained datasets.
arXiv Detail & Related papers (2021-05-06T08:03:45Z)
Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product. We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z)
Error Estimation for Sketched SVD via the Bootstrap [60.67199274260768]
This paper develops a fully data-driven bootstrap method that numerically estimates the actual error of sketched singular vectors/values. The method is computationally inexpensive, because it operates only on sketched objects, and it requires no passes over the full matrix being factored.
arXiv Detail & Related papers (2020-03-10T19:14:08Z)
Variance Reduction with Sparse Gradients [82.41780420431205]
Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients. We introduce a new sparsity operator: The random-top-k operator. Our algorithm consistently outperforms SpiderBoost on various tasks including image classification, natural language processing, and sparse matrix factorization.
arXiv Detail & Related papers (2020-01-27T08:23:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.