Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers
- URL: http://arxiv.org/abs/2409.11859v1
- Date: Wed, 18 Sep 2024 10:28:28 GMT
- Title: Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers
- Authors: Ekaterina Grishina, Mikhail Gorbunov, Maxim Rakhuba,
- Abstract summary: Controlling the spectral norm of the Jacobian matrix has been shown to improve generalization, training stability and robustness in CNNs.
Existing methods for computing the norm either tend to overestimate it or their performance may deteriorate quickly with increasing the input and kernel sizes.
In this paper, we demonstrate that the tensor version of the spectral norm of a four-dimensional convolution kernel, up to a constant factor, serves as an upper bound for the spectral norm of the Jacobian matrix associated with the convolution operation.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Controlling the spectral norm of the Jacobian matrix, which is related to the convolution operation, has been shown to improve generalization, training stability and robustness in CNNs. Existing methods for computing the norm either tend to overestimate it or their performance may deteriorate quickly with increasing the input and kernel sizes. In this paper, we demonstrate that the tensor version of the spectral norm of a four-dimensional convolution kernel, up to a constant factor, serves as an upper bound for the spectral norm of the Jacobian matrix associated with the convolution operation. This new upper bound is independent of the input image resolution, differentiable and can be efficiently calculated during training. Through experiments, we demonstrate how this new bound can be used to improve the performance of convolutional architectures.
Related papers
- Nonparametric Instrumental Regression via Kernel Methods is Minimax Optimal [28.361133177290657]
We study the kernel instrumental variable algorithm of citetsingh 2019.
We show that the kernel NPIV estimator converges to the IV solution with minimum norm.
We also improve the original kernel NPIV algorithm by adopting a general spectral regularization in stage 1 regression.
arXiv Detail & Related papers (2024-11-29T12:18:01Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - GloptiNets: Scalable Non-Convex Optimization with Certificates [61.50835040805378]
We present a novel approach to non-cube optimization with certificates, which handles smooth functions on the hypercube or on the torus.
By exploiting the regularity of the target function intrinsic in the decay of its spectrum, we allow at the same time to obtain precise certificates and leverage the advanced and powerful neural networks.
arXiv Detail & Related papers (2023-06-26T09:42:59Z) - Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram
Iteration [122.51142131506639]
We introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory.
We show through a comprehensive set of experiments that our approach outperforms other state-of-the-art methods in terms of precision, computational cost, and scalability.
It proves highly effective for the Lipschitz regularization of convolutional neural networks, with competitive results against concurrent approaches.
arXiv Detail & Related papers (2023-05-25T15:32:21Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Depthwise Separable Convolutions Allow for Fast and Memory-Efficient
Spectral Normalization [1.1470070927586016]
We introduce a very simple method for spectral normalization of depthwise separable convolutions.
We demonstrate the effectiveness of our method on image classification tasks using standard architectures like MobileNetV2.
arXiv Detail & Related papers (2021-02-12T12:55:42Z) - Efficient Semi-Implicit Variational Inference [65.07058307271329]
We propose an efficient and scalable semi-implicit extrapolational (SIVI)
Our method maps SIVI's evidence to a rigorous inference of lower gradient values.
arXiv Detail & Related papers (2021-01-15T11:39:09Z) - Asymptotic Singular Value Distribution of Linear Convolutional Layers [19.471693124432022]
In convolutional neural networks, the linear transformation of convolutional layers with linear convolution is a block matrix with doubly Toeplitz blocks.
We develop a simple singular value approximation method with improved accuracy over the circular approximation.
We also demonstrate that the spectral norm upper bounds are effective spectral regularizers for improving generalization performance in ResNets.
arXiv Detail & Related papers (2020-06-12T12:21:08Z) - Controllable Orthogonalization in Training DNNs [96.1365404059924]
Orthogonality is widely used for training deep neural networks (DNNs) due to its ability to maintain all singular values of the Jacobian close to 1.
This paper proposes a computationally efficient and numerically stable orthogonalization method using Newton's iteration (ONI)
We show that our method improves the performance of image classification networks by effectively controlling the orthogonality to provide an optimal tradeoff between optimization benefits and representational capacity reduction.
We also show that ONI stabilizes the training of generative adversarial networks (GANs) by maintaining the Lipschitz continuity of a network, similar to spectral normalization (
arXiv Detail & Related papers (2020-04-02T10:14:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.