Existence, Stability and Scalability of Orthogonal Convolutional Neural
Networks
- URL: http://arxiv.org/abs/2108.05623v3
- Date: Fri, 13 Jan 2023 12:34:57 GMT
- Title: Existence, Stability and Scalability of Orthogonal Convolutional Neural
Networks
- Authors: El Mehdi Achour (IMT), Fran\c{c}ois Malgouyres (IMT), Franck Mamalet
- Abstract summary: Imposing orthogonality on the layers of neural networks is known to facilitate the learning by limiting the exploding/vanishing of the gradient; decorrelate the features; improve the robustness.
This paper studies the theoretical properties of orthogonal convolutional layers.
- Score: 1.0742675209112622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imposing orthogonality on the layers of neural networks is known to
facilitate the learning by limiting the exploding/vanishing of the gradient;
decorrelate the features; improve the robustness. This paper studies the
theoretical properties of orthogonal convolutional layers.We establish
necessary and sufficient conditions on the layer architecture guaranteeing the
existence of an orthogonal convolutional transform. The conditions prove that
orthogonal convolutional transforms exist for almost all architectures used in
practice for 'circular' padding.We also exhibit limitations with 'valid'
boundary conditions and 'same' boundary conditions with zero-padding.Recently,
a regularization term imposing the orthogonality of convolutional layers has
been proposed, and impressive empirical results have been obtained in different
applications (Wang et al. 2020).The second motivation of the present paper is
to specify the theory behind this.We make the link between this regularization
term and orthogonality measures. In doing so, we show that this regularization
strategy is stable with respect to numerical and optimization errors and that,
in the presence of small errors and when the size of the signal/image is large,
the convolutional layers remain close to isometric.The theoretical results are
confirmed with experiments and the landscape of the regularization term is
studied. Experiments on real data sets show that when orthogonality is used to
enforce robustness, the parameter multiplying the regularization termcan be
used to tune a tradeoff between accuracy and orthogonality, for the benefit of
both accuracy and robustness.Altogether, the study guarantees that the
regularization proposed in Wang et al. (2020) is an efficient, flexible and
stable numerical strategy to learn orthogonal convolutional layers.
Related papers
- Thinner Latent Spaces: Detecting dimension and imposing invariance through autoencoder gradient constraints [9.380902608139902]
We show that orthogonality relations within the latent layer of the network can be leveraged to infer the intrinsic dimensionality of nonlinear manifold data sets.
We outline the relevant theory relying on differential geometry, and describe the corresponding gradient-descent optimization algorithm.
arXiv Detail & Related papers (2024-08-28T20:56:35Z) - Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram
Iteration [122.51142131506639]
We introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory.
We show through a comprehensive set of experiments that our approach outperforms other state-of-the-art methods in terms of precision, computational cost, and scalability.
It proves highly effective for the Lipschitz regularization of convolutional neural networks, with competitive results against concurrent approaches.
arXiv Detail & Related papers (2023-05-25T15:32:21Z) - Demystifying the Global Convergence Puzzle of Learning
Over-parameterized ReLU Nets in Very High Dimensions [1.3401746329218014]
This paper is devoted to rigorous theory for demystifying the global convergence phenomenon in a challenging scenario: learning over-dimensionalized data.
A major ingredient of our theory is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that it is that
arXiv Detail & Related papers (2022-06-05T02:14:21Z) - Learning Discriminative Shrinkage Deep Networks for Image Deconvolution [122.79108159874426]
We propose an effective non-blind deconvolution approach by learning discriminative shrinkage functions to implicitly model these terms.
Experimental results show that the proposed method performs favorably against the state-of-the-art ones in terms of efficiency and accuracy.
arXiv Detail & Related papers (2021-11-27T12:12:57Z) - Orthogonalizing Convolutional Layers with the Cayley Transform [83.73855414030646]
We propose and evaluate an alternative approach to parameterize convolutional layers that are constrained to be orthogonal.
We show that our method indeed preserves orthogonality to a high degree even for large convolutions.
arXiv Detail & Related papers (2021-04-14T23:54:55Z) - Posterior-Aided Regularization for Likelihood-Free Inference [23.708122045184698]
Posterior-Aided Regularization (PAR) is applicable to learning the density estimator, regardless of the model structure.
We provide a unified estimation method of PAR to estimate both reverse KL term and mutual information term with a single neural network.
arXiv Detail & Related papers (2021-02-15T16:59:30Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.