Uniform Convergence, Adversarial Spheres and a Simple Remedy
- URL: http://arxiv.org/abs/2105.03491v1
- Date: Fri, 7 May 2021 20:23:01 GMT
- Title: Uniform Convergence, Adversarial Spheres and a Simple Remedy
- Authors: Gregor Bachmann, Seyed-Mohsen Moosavi-Dezfooli, Thomas Hofmann
- Abstract summary: Previous work has cast doubt on the general framework of uniform convergence and its ability to explain generalization in neural networks.
We provide an extensive theoretical investigation of the previously studied data setting through the lens of infinitely-wide models.
We prove that the Neural Tangent Kernel (NTK) also suffers from the same phenomenon and we uncover its origin.
- Score: 40.44709296304123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous work has cast doubt on the general framework of uniform convergence
and its ability to explain generalization in neural networks. By considering a
specific dataset, it was observed that a neural network completely
misclassifies a projection of the training data (adversarial set), rendering
any existing generalization bound based on uniform convergence vacuous. We
provide an extensive theoretical investigation of the previously studied data
setting through the lens of infinitely-wide models. We prove that the Neural
Tangent Kernel (NTK) also suffers from the same phenomenon and we uncover its
origin. We highlight the important role of the output bias and show
theoretically as well as empirically how a sensible choice completely mitigates
the problem. We identify sharp phase transitions in the accuracy on the
adversarial set and study its dependency on the training sample size. As a
result, we are able to characterize critical sample sizes beyond which the
effect disappears. Moreover, we study decompositions of a neural network into a
clean and noisy part by considering its canonical decomposition into its
different eigenfunctions and show empirically that for too small bias the
adversarial phenomenon still persists.
Related papers
- Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation [53.27596811146316]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts.
We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep.
We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z) - lpNTK: Better Generalisation with Less Data via Sample Interaction During Learning [22.59771349030541]
We propose a pseudo Neural Tangent Kernel (lpNTK) to take label information into consideration when measuring the interactions between samples.
lpNTK helps to understand learning phenomena identified in previous work, specifically the learning difficulty of samples and forgetting events during learning.
We show that using lpNTK to identify and remove poisoning training samples does not hurt the generalisation performance of ANNs.
arXiv Detail & Related papers (2024-01-16T20:20:10Z) - Learning Linear Causal Representations from Interventions under General
Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets.
This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z) - On the ISS Property of the Gradient Flow for Single Hidden-Layer Neural
Networks with Linear Activations [0.0]
We investigate the effects of overfitting on the robustness of gradient-descent training when subject to uncertainty on the gradient estimation.
We show that the general overparametrized formulation introduces a set of spurious equilibria which lay outside the set where the loss function is minimized.
arXiv Detail & Related papers (2023-05-17T02:26:34Z) - Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics
for Convex Losses in High-Dimension [25.711297863946193]
We develop a theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features.
We provide a complete description of the joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the high-dimensional limit.
arXiv Detail & Related papers (2022-01-31T17:44:58Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - The Hidden Uncertainty in a Neural Networks Activations [105.4223982696279]
The distribution of a neural network's latent representations has been successfully used to detect out-of-distribution (OOD) data.
This work investigates whether this distribution correlates with a model's epistemic uncertainty, thus indicating its ability to generalise to novel inputs.
arXiv Detail & Related papers (2020-12-05T17:30:35Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Topologically Densified Distributions [25.140319008330167]
We study regularization in the context of small sample-size learning with over- parameterized neural networks.
We impose a topological constraint on samples drawn from the probability measure induced in that space.
This provably leads to mass concentration effects around the representations of training instances.
arXiv Detail & Related papers (2020-02-12T05:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.