Orthogonalizing Convolutional Layers with the Cayley Transform
- URL: http://arxiv.org/abs/2104.07167v1
- Date: Wed, 14 Apr 2021 23:54:55 GMT
- Title: Orthogonalizing Convolutional Layers with the Cayley Transform
- Authors: Asher Trockman, J. Zico Kolter
- Abstract summary: We propose and evaluate an alternative approach to parameterize convolutional layers that are constrained to be orthogonal.
We show that our method indeed preserves orthogonality to a high degree even for large convolutions.
- Score: 83.73855414030646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has highlighted several advantages of enforcing orthogonality in
the weight layers of deep networks, such as maintaining the stability of
activations, preserving gradient norms, and enhancing adversarial robustness by
enforcing low Lipschitz constants. Although numerous methods exist for
enforcing the orthogonality of fully-connected layers, those for convolutional
layers are more heuristic in nature, often focusing on penalty methods or
limited classes of convolutions. In this work, we propose and evaluate an
alternative approach to directly parameterize convolutional layers that are
constrained to be orthogonal. Specifically, we propose to apply the Cayley
transform to a skew-symmetric convolution in the Fourier domain, so that the
inverse convolution needed by the Cayley transform can be computed efficiently.
We compare our method to previous Lipschitz-constrained and orthogonal
convolutional layers and show that it indeed preserves orthogonality to a high
degree even for large convolutions. Applied to the problem of certified
adversarial robustness, we show that networks incorporating the layer
outperform existing deterministic methods for certified defense against
$\ell_2$-norm-bounded adversaries, while scaling to larger architectures than
previously investigated. Code is available at
https://github.com/locuslab/orthogonal-convolutions.
Related papers
- GloptiNets: Scalable Non-Convex Optimization with Certificates [61.50835040805378]
We present a novel approach to non-cube optimization with certificates, which handles smooth functions on the hypercube or on the torus.
By exploiting the regularity of the target function intrinsic in the decay of its spectrum, we allow at the same time to obtain precise certificates and leverage the advanced and powerful neural networks.
arXiv Detail & Related papers (2023-06-26T09:42:59Z) - Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram
Iteration [122.51142131506639]
We introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory.
We show through a comprehensive set of experiments that our approach outperforms other state-of-the-art methods in terms of precision, computational cost, and scalability.
It proves highly effective for the Lipschitz regularization of convolutional neural networks, with competitive results against concurrent approaches.
arXiv Detail & Related papers (2023-05-25T15:32:21Z) - A Unified Algebraic Perspective on Lipschitz Neural Networks [88.14073994459586]
This paper introduces a novel perspective unifying various types of 1-Lipschitz neural networks.
We show that many existing techniques can be derived and generalized via finding analytical solutions of a common semidefinite programming (SDP) condition.
Our approach, called SDP-based Lipschitz Layers (SLL), allows us to design non-trivial yet efficient generalization of convex potential layers.
arXiv Detail & Related papers (2023-03-06T14:31:09Z) - Cyclic Block Coordinate Descent With Variance Reduction for Composite
Nonconvex Optimization [26.218670461973705]
We propose methods for solving problems coordinate non-asymptotic gradient norm guarantees.
Our results demonstrate the efficacy of the proposed cyclic-reduced scheme in training deep neural nets.
arXiv Detail & Related papers (2022-12-09T19:17:39Z) - Towards Practical Control of Singular Values of Convolutional Layers [65.25070864775793]
Convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control.
Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties.
We offer a principled approach to alleviating constraints of the prior art at the expense of an insignificant reduction in layer expressivity.
arXiv Detail & Related papers (2022-11-24T19:09:44Z) - Existence, Stability and Scalability of Orthogonal Convolutional Neural
Networks [1.0742675209112622]
Imposing orthogonality on the layers of neural networks is known to facilitate the learning by limiting the exploding/vanishing of the gradient; decorrelate the features; improve the robustness.
This paper studies the theoretical properties of orthogonal convolutional layers.
arXiv Detail & Related papers (2021-08-12T09:30:53Z) - Scaling-up Diverse Orthogonal Convolutional Networks with a Paraunitary
Framework [16.577482515547793]
We propose a theoretical framework for orthogonal convolutional layers.
Our framework endows high expressive power to various convolutional layers while maintaining their exactity.
Our layers are memory and computationally efficient for deep networks compared to previous designs.
arXiv Detail & Related papers (2021-06-16T20:50:59Z) - Skew Orthogonal Convolutions [44.053067014796596]
Training convolutional neural networks with a Lipschitz constraint under the $l_2$ norm is useful for provable adversarial robustness, interpretable gradients, stable training, etc.
Methodabv allows us to train provably Lipschitz, large convolutional neural networks significantly faster than prior works.
arXiv Detail & Related papers (2021-05-24T17:11:44Z) - DO-Conv: Depthwise Over-parameterized Convolutional Layer [66.46704754669169]
We propose to augment a convolutional layer with an additional depthwise convolution, where each input channel is convolved with a different 2D kernel.
We show with extensive experiments that the mere replacement of conventional convolutional layers with DO-Conv layers boosts the performance of CNNs.
arXiv Detail & Related papers (2020-06-22T06:57:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.