Related papers: Orthogonalizing Convolutional Layers with the Cayley Transform

Orthogonalizing Convolutional Layers with the Cayley Transform

URL: http://arxiv.org/abs/2104.07167v1
Date: Wed, 14 Apr 2021 23:54:55 GMT
Title: Orthogonalizing Convolutional Layers with the Cayley Transform
Authors: Asher Trockman, J. Zico Kolter
Abstract summary: We propose and evaluate an alternative approach to parameterize convolutional layers that are constrained to be orthogonal. We show that our method indeed preserves orthogonality to a high degree even for large convolutions.
Score: 83.73855414030646
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent work has highlighted several advantages of enforcing orthogonality in the weight layers of deep networks, such as maintaining the stability of activations, preserving gradient norms, and enhancing adversarial robustness by enforcing low Lipschitz constants. Although numerous methods exist for enforcing the orthogonality of fully-connected layers, those for convolutional layers are more heuristic in nature, often focusing on penalty methods or limited classes of convolutions. In this work, we propose and evaluate an alternative approach to directly parameterize convolutional layers that are constrained to be orthogonal. Specifically, we propose to apply the Cayley transform to a skew-symmetric convolution in the Fourier domain, so that the inverse convolution needed by the Cayley transform can be computed efficiently. We compare our method to previous Lipschitz-constrained and orthogonal convolutional layers and show that it indeed preserves orthogonality to a high degree even for large convolutions. Applied to the problem of certified adversarial robustness, we show that networks incorporating the layer outperform existing deterministic methods for certified defense against $\ell_2$-norm-bounded adversaries, while scaling to larger architectures than previously investigated. Code is available at https://github.com/locuslab/orthogonal-convolutions.

Related papers

An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures [8.136541584281987]
We introduce AOC (Adaptative Orthogonal Convolution), a scalable method for constructing orthogonal convolutions. We demonstrate through our experiments that our method produces expressive models that become increasingly efficient as they scale.
arXiv Detail & Related papers (2025-01-14T08:32:12Z)
GloptiNets: Scalable Non-Convex Optimization with Certificates [61.50835040805378]
We present a novel approach to non-cube optimization with certificates, which handles smooth functions on the hypercube or on the torus. By exploiting the regularity of the target function intrinsic in the decay of its spectrum, we allow at the same time to obtain precise certificates and leverage the advanced and powerful neural networks.
arXiv Detail & Related papers (2023-06-26T09:42:59Z)
Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram Iteration [122.51142131506639]
We introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory. We show through a comprehensive set of experiments that our approach outperforms other state-of-the-art methods in terms of precision, computational cost, and scalability. It proves highly effective for the Lipschitz regularization of convolutional neural networks, with competitive results against concurrent approaches.
arXiv Detail & Related papers (2023-05-25T15:32:21Z)
A Unified Algebraic Perspective on Lipschitz Neural Networks [88.14073994459586]
This paper introduces a novel perspective unifying various types of 1-Lipschitz neural networks. We show that many existing techniques can be derived and generalized via finding analytical solutions of a common semidefinite programming (SDP) condition. Our approach, called SDP-based Lipschitz Layers (SLL), allows us to design non-trivial yet efficient generalization of convex potential layers.
arXiv Detail & Related papers (2023-03-06T14:31:09Z)
Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization [26.218670461973705]
We propose methods for solving problems coordinate non-asymptotic gradient norm guarantees. Our results demonstrate the efficacy of the proposed cyclic-reduced scheme in training deep neural nets.
arXiv Detail & Related papers (2022-12-09T19:17:39Z)
Towards Practical Control of Singular Values of Convolutional Layers [65.25070864775793]
Convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control. Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties. We offer a principled approach to alleviating constraints of the prior art at the expense of an insignificant reduction in layer expressivity.
arXiv Detail & Related papers (2022-11-24T19:09:44Z)
Existence, Stability and Scalability of Orthogonal Convolutional Neural Networks [1.0742675209112622]
Imposing orthogonality on the layers of neural networks is known to facilitate the learning by limiting the exploding/vanishing of the gradient; decorrelate the features; improve the robustness. This paper studies the theoretical properties of orthogonal convolutional layers.
arXiv Detail & Related papers (2021-08-12T09:30:53Z)
Scaling-up Diverse Orthogonal Convolutional Networks with a Paraunitary Framework [16.577482515547793]
We propose a theoretical framework for orthogonal convolutional layers. Our framework endows high expressive power to various convolutional layers while maintaining their exactity. Our layers are memory and computationally efficient for deep networks compared to previous designs.
arXiv Detail & Related papers (2021-06-16T20:50:59Z)
Skew Orthogonal Convolutions [44.053067014796596]
Training convolutional neural networks with a Lipschitz constraint under the $l_2$ norm is useful for provable adversarial robustness, interpretable gradients, stable training, etc. Methodabv allows us to train provably Lipschitz, large convolutional neural networks significantly faster than prior works.
arXiv Detail & Related papers (2021-05-24T17:11:44Z)
DO-Conv: Depthwise Over-parameterized Convolutional Layer [66.46704754669169]
We propose to augment a convolutional layer with an additional depthwise convolution, where each input channel is convolved with a different 2D kernel. We show with extensive experiments that the mere replacement of conventional convolutional layers with DO-Conv layers boosts the performance of CNNs.
arXiv Detail & Related papers (2020-06-22T06:57:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.