The Lipschitz Constant of Self-Attention
- URL: http://arxiv.org/abs/2006.04710v2
- Date: Wed, 9 Jun 2021 14:13:40 GMT
- Title: The Lipschitz Constant of Self-Attention
- Authors: Hyunjik Kim, George Papamakarios, Andriy Mnih
- Abstract summary: Lipschitz constants of neural networks have been explored in various contexts in deep learning.
We investigate the Lipschitz constant of self-attention, a non-linear neural network module widely used in sequence modelling.
- Score: 27.61634862685452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lipschitz constants of neural networks have been explored in various contexts
in deep learning, such as provable adversarial robustness, estimating
Wasserstein distance, stabilising training of GANs, and formulating invertible
neural networks. Such works have focused on bounding the Lipschitz constant of
fully connected or convolutional networks, composed of linear maps and
pointwise non-linearities. In this paper, we investigate the Lipschitz constant
of self-attention, a non-linear neural network module widely used in sequence
modelling. We prove that the standard dot-product self-attention is not
Lipschitz for unbounded input domain, and propose an alternative L2
self-attention that is Lipschitz. We derive an upper bound on the Lipschitz
constant of L2 self-attention and provide empirical evidence for its asymptotic
tightness. To demonstrate the practical relevance of our theoretical work, we
formulate invertible self-attention and use it in a Transformer-based
architecture for a character-level language modelling task.
Related papers
- Efficiently Computing Local Lipschitz Constants of Neural Networks via
Bound Propagation [79.13041340708395]
Lipschitz constants are connected to many properties of neural networks, such as robustness, fairness, and generalization.
Existing methods for computing Lipschitz constants either produce relatively loose upper bounds or are limited to small networks.
We develop an efficient framework for computing the $ell_infty$ local Lipschitz constant of a neural network by tightly upper bounding the norm of Clarke Jacobian.
arXiv Detail & Related papers (2022-10-13T22:23:22Z) - Rethinking Lipschitz Neural Networks for Certified L-infinity Robustness [33.72713778392896]
We study certified $ell_infty$ from a novel perspective of representing Boolean functions.
We develop a unified Lipschitz network that generalizes prior works, and design a practical version that can be efficiently trained.
arXiv Detail & Related papers (2022-10-04T17:55:27Z) - Lipschitz Continuity Retained Binary Neural Network [52.17734681659175]
We introduce the Lipschitz continuity as the rigorous criteria to define the model robustness for BNN.
We then propose to retain the Lipschitz continuity as a regularization term to improve the model robustness.
Our experiments prove that our BNN-specific regularization method can effectively strengthen the robustness of BNN.
arXiv Detail & Related papers (2022-07-13T22:55:04Z) - Training Certifiably Robust Neural Networks with Efficient Local
Lipschitz Bounds [99.23098204458336]
Certified robustness is a desirable property for deep neural networks in safety-critical applications.
We show that our method consistently outperforms state-of-the-art methods on MNIST and TinyNet datasets.
arXiv Detail & Related papers (2021-11-02T06:44:10Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - LipBaB: Computing exact Lipschitz constant of ReLU networks [0.0]
LipBaB is a framework to compute certified bounds of the local Lipschitz constant of deep neural networks.
Our algorithm can provide provably exact computation of the Lipschitz constant for any p-norm.
arXiv Detail & Related papers (2021-05-12T08:06:11Z) - Lipschitz Recurrent Neural Networks [100.72827570987992]
We show that our Lipschitz recurrent unit is more robust with respect to input and parameter perturbations as compared to other continuous-time RNNs.
Our experiments demonstrate that the Lipschitz RNN can outperform existing recurrent units on a range of benchmark tasks.
arXiv Detail & Related papers (2020-06-22T08:44:52Z) - On Lipschitz Regularization of Convolutional Layers using Toeplitz
Matrix Theory [77.18089185140767]
Lipschitz regularity is established as a key property of modern deep learning.
computing the exact value of the Lipschitz constant of a neural network is known to be NP-hard.
We introduce a new upper bound for convolutional layers that is both tight and easy to compute.
arXiv Detail & Related papers (2020-06-15T13:23:34Z) - Deep Neural Networks with Trainable Activations and Controlled Lipschitz
Constant [26.22495169129119]
We introduce a variational framework to learn the activation functions of deep neural networks.
Our aim is to increase the capacity of the network while controlling an upper-bound of the Lipschitz constant.
We numerically compare our scheme with standard ReLU network and its variations, PReLU and LeakyReLU.
arXiv Detail & Related papers (2020-01-17T12:32:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.