A Law of Robustness for Weight-bounded Neural Networks
- URL: http://arxiv.org/abs/2102.08093v1
- Date: Tue, 16 Feb 2021 11:28:59 GMT
- Title: A Law of Robustness for Weight-bounded Neural Networks
- Authors: Hisham Husain, Borja Balle
- Abstract summary: Recently, (Bubeck et al., 2020) conjectured that when using two-layer networks with $k$ neurons to fit a generic dataset, the smallest Lipschitz constant is $Omega(sqrtfracnk)$.
In this work we derive a lower bound on the Lipschitz constant for any arbitrary model class with bounded Rademacher complexity.
Our result coincides with that conjectured in (Bubeck et al., 2020) for two-layer networks under the assumption of bounded weights.
- Score: 37.54604146791085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robustness of deep neural networks against adversarial perturbations is a
pressing concern motivated by recent findings showing the pervasive nature of
such vulnerabilities. One method of characterizing the robustness of a neural
network model is through its Lipschitz constant, which forms a robustness
certificate. A natural question to ask is, for a fixed model class (such as
neural networks) and a dataset of size $n$, what is the smallest achievable
Lipschitz constant among all models that fit the dataset? Recently, (Bubeck et
al., 2020) conjectured that when using two-layer networks with $k$ neurons to
fit a generic dataset, the smallest Lipschitz constant is
$\Omega(\sqrt{\frac{n}{k}})$. This implies that one would require one neuron
per data point to robustly fit the data. In this work we derive a lower bound
on the Lipschitz constant for any arbitrary model class with bounded Rademacher
complexity. Our result coincides with that conjectured in (Bubeck et al., 2020)
for two-layer networks under the assumption of bounded weights. However, due to
our result's generality, we also derive bounds for multi-layer neural networks,
discovering that one requires $\log n$ constant-sized layers to robustly fit
the data. Thus, our work establishes a law of robustness for weight bounded
neural networks and provides formal evidence on the necessity of
over-parametrization in deep learning.
Related papers
- Computable Lipschitz Bounds for Deep Neural Networks [0.0]
We analyse three existing upper bounds written for the $l2$ norm.
We propose two novel bounds for both feed-forward fully-connected neural networks and convolutional neural networks.
arXiv Detail & Related papers (2024-10-28T14:09:46Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - The Rate of Convergence of Variation-Constrained Deep Neural Networks [35.393855471751756]
We show that a class of variation-constrained neural networks can achieve near-parametric rate $n-1/2+delta$ for an arbitrarily small constant $delta$.
The result indicates that the neural function space needed for approximating smooth functions may not be as large as what is often perceived.
arXiv Detail & Related papers (2021-06-22T21:28:00Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Fundamental tradeoffs between memorization and robustness in random
features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded.
Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z) - A law of robustness for two-layers neural networks [35.996863024271974]
We make a conjecture that, for any Lipschitz activation function and for most datasets, any two-layers neural network with $k$ neurons that perfectly fit the data must have its Lipschitz constant larger (up to a constant) than $sqrtn/k$ where $n$ is the number of datapoints.
We prove this conjecture when the Lipschitz constant is replaced by an upper bound on it based on the spectral norm of the weight matrix.
arXiv Detail & Related papers (2020-09-30T05:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.