Convolutional unitary or orthogonal recurrent neural networks
- URL: http://arxiv.org/abs/2302.07396v1
- Date: Tue, 14 Feb 2023 23:36:21 GMT
- Title: Convolutional unitary or orthogonal recurrent neural networks
- Authors: Marcelo O. Magnasco
- Abstract summary: We show that in the specific case of convolutional RNNs, we can define a convolutional exponential.
We explicitly derive FFT-based algorithms to compute the kernels and their derivatives.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recurrent neural networks are extremely powerful yet hard to train. One of
their issues is the vanishing gradient problem, whereby propagation of training
signals may be exponentially attenuated, freezing training. Use of orthogonal
or unitary matrices, whose powers neither explode nor decay, has been proposed
to mitigate this issue, but their computational expense has hindered their use.
Here we show that in the specific case of convolutional RNNs, we can define a
convolutional exponential and that this operation transforms antisymmetric or
anti-Hermitian convolution kernels into orthogonal or unitary convolution
kernels. We explicitly derive FFT-based algorithms to compute the kernels and
their derivatives. The computational complexity of parametrizing this subspace
of orthogonal transformations is thus the same as the networks' iteration.
Related papers
- Controlling the Inductive Bias of Wide Neural Networks by Modifying the Kernel's Spectrum [18.10812063219831]
We introduce Modified Spectrum Kernels (MSKs) to approximate kernels with desired eigenvalues.
We propose a preconditioned gradient descent method, which alters the trajectory of gradient descent.
Our method is both computationally efficient and simple to implement.
arXiv Detail & Related papers (2023-07-26T22:39:47Z) - Reduce Computational Complexity for Convolutional Layers by Skipping
Zeros [10.742743533768843]
We propose an efficient algorithm for convolutional neural networks.
The C-K-S algorithm is accompanied by efficient GPU implementations.
Experiments show that C-K-S offers good performance in terms of speed and convergence.
arXiv Detail & Related papers (2023-06-28T06:21:22Z) - Permutation Equivariant Neural Functionals [92.0667671999604]
This work studies the design of neural networks that can process the weights or gradients of other neural networks.
We focus on the permutation symmetries that arise in the weights of deep feedforward networks because hidden layer neurons have no inherent order.
In our experiments, we find that permutation equivariant neural functionals are effective on a diverse set of tasks.
arXiv Detail & Related papers (2023-02-27T18:52:38Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - Convolutional Filtering and Neural Networks with Non Commutative
Algebras [153.20329791008095]
We study the generalization of non commutative convolutional neural networks.
We show that non commutative convolutional architectures can be stable to deformations on the space of operators.
arXiv Detail & Related papers (2021-08-23T04:22:58Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Skew Orthogonal Convolutions [44.053067014796596]
Training convolutional neural networks with a Lipschitz constraint under the $l_2$ norm is useful for provable adversarial robustness, interpretable gradients, stable training, etc.
Methodabv allows us to train provably Lipschitz, large convolutional neural networks significantly faster than prior works.
arXiv Detail & Related papers (2021-05-24T17:11:44Z) - Regularization for convolutional kernel tensors to avoid unstable
gradient problem in convolutional neural networks [0.0]
We propose three new regularization terms for a convolutional kernel tensor to constrain the singular values of each transformation matrix.
We show how to carry out the gradient type methods, which provides new insight about the training of convolutional neural networks.
arXiv Detail & Related papers (2021-02-05T03:46:31Z) - Stable Low-rank Tensor Decomposition for Compression of Convolutional
Neural Network [19.717842489217684]
This paper is the first study on degeneracy in the tensor decomposition of convolutional kernels.
We present a novel method, which can stabilize the low-rank approximation of convolutional kernels and ensure efficient compression.
We evaluate our approach on popular CNN architectures for image classification and show that our method results in much lower accuracy degradation and provides consistent performance.
arXiv Detail & Related papers (2020-08-12T17:10:12Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - On Lipschitz Regularization of Convolutional Layers using Toeplitz
Matrix Theory [77.18089185140767]
Lipschitz regularity is established as a key property of modern deep learning.
computing the exact value of the Lipschitz constant of a neural network is known to be NP-hard.
We introduce a new upper bound for convolutional layers that is both tight and easy to compute.
arXiv Detail & Related papers (2020-06-15T13:23:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.