Asymptotic Singular Value Distribution of Linear Convolutional Layers
- URL: http://arxiv.org/abs/2006.07117v1
- Date: Fri, 12 Jun 2020 12:21:08 GMT
- Title: Asymptotic Singular Value Distribution of Linear Convolutional Layers
- Authors: Xinping Yi
- Abstract summary: In convolutional neural networks, the linear transformation of convolutional layers with linear convolution is a block matrix with doubly Toeplitz blocks.
We develop a simple singular value approximation method with improved accuracy over the circular approximation.
We also demonstrate that the spectral norm upper bounds are effective spectral regularizers for improving generalization performance in ResNets.
- Score: 19.471693124432022
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In convolutional neural networks, the linear transformation of multi-channel
two-dimensional convolutional layers with linear convolution is a block matrix
with doubly Toeplitz blocks. Although a "wrapping around" operation can
transform linear convolution to a circular one, by which the singular values
can be approximated with reduced computational complexity by those of a block
matrix with doubly circulant blocks, the accuracy of such an approximation is
not guaranteed. In this paper, we propose to inspect such a linear
transformation matrix through its asymptotic spectral representation - the
spectral density matrix - by which we develop a simple singular value
approximation method with improved accuracy over the circular approximation, as
well as upper bounds for spectral norm with reduced computational complexity.
Compared with the circular approximation, we obtain moderate improvement with a
subtle adjustment of the singular value distribution. We also demonstrate that
the spectral norm upper bounds are effective spectral regularizers for
improving generalization performance in ResNets.
Related papers
- A Random Matrix Approach to Low-Multilinear-Rank Tensor Approximation [24.558241146742205]
We characterize the large-dimensional spectral behavior of the unfoldings of the data tensor and exhibit relevant signal-to-noise ratios governing the detectability of the principal directions of the signal.
Results allow to accurately predict the reconstruction performance of truncated multilinear SVD (MLSVD) in the non-trivial regime.
arXiv Detail & Related papers (2024-02-05T16:38:30Z) - Exponentiation of Parametric Hamiltonians via Unitary interpolation [0.8399688944263842]
We introduce two ideas for the time-efficient approximation of matrix exponentials of linear multi-parametric Hamiltonians.
We modify the Suzuki-Trotter product formula from an approximation to an compilation scheme to improve both accuracy and computational time.
arXiv Detail & Related papers (2024-02-02T15:29:55Z) - Linear Convergence of ISTA and FISTA [8.261388753972234]
We revisit the class of iterative shrinkage-thresholding algorithms (ISTA) for solving the linear inverse problem with sparse representation.
We find that the previous assumption for the smooth part to be convex weakens the least-square model.
We generalize the linear convergence to composite optimization in both the objective value and the squared proximal subgradient norm.
arXiv Detail & Related papers (2022-12-13T02:02:50Z) - Precise Asymptotics for Spectral Methods in Mixed Generalized Linear Models [31.58736590532443]
We consider the problem of estimating two statistically independent signals in a mixed generalized linear model.
Our characterization exploits a mix of tools from random matrices, free probability and the theory of approximate message passing algorithms.
arXiv Detail & Related papers (2022-11-21T11:35:25Z) - Semi-Supervised Subspace Clustering via Tensor Low-Rank Representation [64.49871502193477]
We propose a novel semi-supervised subspace clustering method, which is able to simultaneously augment the initial supervisory information and construct a discriminative affinity matrix.
Comprehensive experimental results on six commonly-used benchmark datasets demonstrate the superiority of our method over state-of-the-art methods.
arXiv Detail & Related papers (2022-05-21T01:47:17Z) - Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via
GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer.
In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph.
Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z) - Joint Network Topology Inference via Structured Fusion Regularization [70.30364652829164]
Joint network topology inference represents a canonical problem of learning multiple graph Laplacian matrices from heterogeneous graph signals.
We propose a general graph estimator based on a novel structured fusion regularization.
We show that the proposed graph estimator enjoys both high computational efficiency and rigorous theoretical guarantee.
arXiv Detail & Related papers (2021-03-05T04:42:32Z) - Exact Linear Convergence Rate Analysis for Low-Rank Symmetric Matrix
Completion via Gradient Descent [22.851500417035947]
Factorization-based gradient descent is a scalable and efficient algorithm for solving the factorrank matrix completion.
We show that gradient descent enjoys fast convergence to estimate a solution of the global nature problem.
arXiv Detail & Related papers (2021-02-04T03:41:54Z) - Channel-Directed Gradients for Optimization of Convolutional Neural
Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error.
We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Controllable Orthogonalization in Training DNNs [96.1365404059924]
Orthogonality is widely used for training deep neural networks (DNNs) due to its ability to maintain all singular values of the Jacobian close to 1.
This paper proposes a computationally efficient and numerically stable orthogonalization method using Newton's iteration (ONI)
We show that our method improves the performance of image classification networks by effectively controlling the orthogonality to provide an optimal tradeoff between optimization benefits and representational capacity reduction.
We also show that ONI stabilizes the training of generative adversarial networks (GANs) by maintaining the Lipschitz continuity of a network, similar to spectral normalization (
arXiv Detail & Related papers (2020-04-02T10:14:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.