Why Approximate Matrix Square Root Outperforms Accurate SVD in Global
Covariance Pooling?
- URL: http://arxiv.org/abs/2105.02498v1
- Date: Thu, 6 May 2021 08:03:45 GMT
- Title: Why Approximate Matrix Square Root Outperforms Accurate SVD in Global
Covariance Pooling?
- Authors: Yue Song, Nicu Sebe, Wei Wang
- Abstract summary: We propose a new GCP meta-layer that uses SVD in the forward pass, and Pad'e Approximants in the backward propagation to compute the gradients.
The proposed meta-layer has been integrated into different CNN models and achieves state-of-the-art performances on both large-scale and fine-grained datasets.
- Score: 59.820507600960745
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Global covariance pooling (GCP) aims at exploiting the second-order
statistics of the convolutional feature. Its effectiveness has been
demonstrated in boosting the classification performance of Convolutional Neural
Networks (CNNs). Singular Value Decomposition (SVD) is used in GCP to compute
the matrix square root. However, the approximate matrix square root calculated
using Newton-Schulz iteration \cite{li2018towards} outperforms the accurate one
computed via SVD \cite{li2017second}. We empirically analyze the reason behind
the performance gap from the perspectives of data precision and gradient
smoothness. Various remedies for computing smooth SVD gradients are
investigated. Based on our observation and analyses, a hybrid training protocol
is proposed for SVD-based GCP meta-layers such that competitive performances
can be achieved against Newton-Schulz iteration. Moreover, we propose a new GCP
meta-layer that uses SVD in the forward pass, and Pad\'e Approximants in the
backward propagation to compute the gradients. The proposed meta-layer has been
integrated into different CNN models and achieves state-of-the-art performances
on both large-scale and fine-grained datasets.
Related papers
- Neural Gradient Learning and Optimization for Oriented Point Normal
Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation.
We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors.
Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z) - From Spectral Graph Convolutions to Large Scale Graph Convolutional
Networks [0.0]
Graph Convolutional Networks (GCNs) have been shown to be a powerful concept that has been successfully applied to a large variety of tasks.
We study the theory that paved the way to the definition of GCN, including related parts of classical graph theory.
arXiv Detail & Related papers (2022-07-12T16:57:08Z) - Alternating Mahalanobis Distance Minimization for Stable and Accurate CP
Decomposition [4.847980206213335]
We introduce a new formulation for deriving singular values and vectors of a tensor by considering the critical points of a function different from what is used in the previous work.
We show that a subsweep of this algorithm can achieve a superlinear convergence rate for exact CPD with known rank.
We then view the algorithm as optimizing a Mahalanobis distance with respect to each factor with ground metric dependent on the other factors.
arXiv Detail & Related papers (2022-04-14T19:56:36Z) - Implicit SVD for Graph Representation Learning [33.761179632722]
We make Graph Representational Learning more computationally tractable for those with modest hardware.
We derive linear approximation of a SOTA model, and train the model, in closed-form, via SVD of $mathbfM$, without calculating entries.
Our models show competitive empirical test performance over various graphs.
arXiv Detail & Related papers (2021-11-11T16:58:17Z) - Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via
GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer.
In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph.
Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z) - Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets.
We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels.
We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z) - Exploiting Adam-like Optimization Algorithms to Improve the Performance
of Convolutional Neural Networks [82.61182037130405]
gradient descent (SGD) is the main approach for training deep networks.
In this work, we compare Adam based variants based on the difference between the present and the past gradients.
We have tested ensemble of networks and the fusion with ResNet50 trained with gradient descent.
arXiv Detail & Related papers (2021-03-26T18:55:08Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.