Why Robust Generalization in Deep Learning is Difficult: Perspective of
Expressive Power
- URL: http://arxiv.org/abs/2205.13863v1
- Date: Fri, 27 May 2022 09:53:04 GMT
- Title: Why Robust Generalization in Deep Learning is Difficult: Perspective of
Expressive Power
- Authors: Binghui Li, Jikai Jin, Han Zhong, John E. Hopcroft, Liwei Wang
- Abstract summary: We show that for binary classification problems with well-separated data, there exists a constant robust generalization gap unless the size of the neural network is exponential.
We establish an improved upper bound of $exp(mathcalO(k))$ for the network size to achieve low robust generalization error.
- Score: 15.210336733607488
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is well-known that modern neural networks are vulnerable to adversarial
examples. To mitigate this problem, a series of robust learning algorithms have
been proposed. However, although the robust training error can be near zero via
some methods, all existing algorithms lead to a high robust generalization
error. In this paper, we provide a theoretical understanding of this puzzling
phenomenon from the perspective of expressive power for deep neural networks.
Specifically, for binary classification problems with well-separated data, we
show that, for ReLU networks, while mild over-parameterization is sufficient
for high robust training accuracy, there exists a constant robust
generalization gap unless the size of the neural network is exponential in the
data dimension $d$. Even if the data is linear separable, which means achieving
low clean generalization error is easy, we can still prove an
$\exp({\Omega}(d))$ lower bound for robust generalization. Moreover, we
establish an improved upper bound of $\exp({\mathcal{O}}(k))$ for the network
size to achieve low robust generalization error when the data lies on a
manifold with intrinsic dimension $k$ ($k \ll d$). Nonetheless, we also have a
lower bound that grows exponentially with respect to $k$ -- the curse of
dimensionality is inevitable. By demonstrating an exponential separation
between the network size for achieving low robust training and generalization
error, our results reveal that the hardness of robust generalization may stem
from the expressive power of practical models.
Related papers
- On Generalization Bounds for Deep Compound Gaussian Neural Networks [1.4425878137951238]
Unrolled deep neural networks (DNNs) provide better interpretability and superior empirical performance than standard DNNs.
We develop novel generalization error bounds for a class of unrolled DNNs informed by a compound Gaussian prior.
Under realistic conditions, we show that, at worst, the generalization error scales $mathcalO(nsqrt(n))$ in the signal dimension and $mathcalO(($Network Size$)3/2)$ in network size.
arXiv Detail & Related papers (2024-02-20T16:01:39Z) - Can overfitted deep neural networks in adversarial training generalize?
-- An approximation viewpoint [25.32729343174394]
Adrial training is a widely used method to improve the robustness of deep neural networks (DNNs) over adversarial perturbations.
In this paper, we provide a theoretical understanding of whether overfitted DNNs in adversarial training can generalize from an approximation viewpoint.
arXiv Detail & Related papers (2024-01-24T17:54:55Z) - Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural
Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs)
It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z) - Learning Non-Vacuous Generalization Bounds from Optimization [8.294831479902658]
We present a simple yet non-vacuous generalization bound from the optimization perspective.
We achieve this goal by leveraging that the hypothesis set accessed by gradient algorithms is essentially fractal-like.
Numerical studies demonstrate that our approach is able to yield plausible generalization guarantees for modern neural networks.
arXiv Detail & Related papers (2022-06-09T08:59:46Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Compressive Sensing and Neural Networks from a Statistical Learning
Perspective [4.561032960211816]
We present a generalization error analysis for a class of neural networks suitable for sparse reconstruction from few linear measurements.
Under realistic conditions, the generalization error scales only logarithmically in the number of layers, and at most linear in number of measurements.
arXiv Detail & Related papers (2020-10-29T15:05:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.