Provable Generalization of SGD-trained Neural Networks of Any Width in
the Presence of Adversarial Label Noise
- URL: http://arxiv.org/abs/2101.01152v3
- Date: Mon, 15 Feb 2021 18:57:47 GMT
- Title: Provable Generalization of SGD-trained Neural Networks of Any Width in
the Presence of Adversarial Label Noise
- Authors: Spencer Frei and Yuan Cao and Quanquan Gu
- Abstract summary: We consider a one-hidden-layer leaky ReLU network of arbitrary width trained by gradient descent.
We prove that SGD produces neural networks that have classification accuracy competitive with that of the best halfspace over the distribution.
- Score: 85.59576523297568
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a one-hidden-layer leaky ReLU network of arbitrary width trained
by stochastic gradient descent (SGD) following an arbitrary initialization. We
prove that SGD produces neural networks that have classification accuracy
competitive with that of the best halfspace over the distribution for a broad
class of distributions that includes log-concave isotropic and hard margin
distributions. Equivalently, such networks can generalize when the data
distribution is linearly separable but corrupted with adversarial label noise,
despite the capacity to overfit. To the best of our knowledge, this is the
first work to show that overparameterized neural networks trained by SGD can
generalize when the data is corrupted with adversarial label noise.
Related papers
- Graph Out-of-Distribution Generalization via Causal Intervention [69.70137479660113]
We introduce a conceptually simple yet principled approach for training robust graph neural networks (GNNs) under node-level distribution shifts.
Our method resorts to a new learning objective derived from causal inference that coordinates an environment estimator and a mixture-of-expert GNN predictor.
Our model can effectively enhance generalization with various types of distribution shifts and yield up to 27.4% accuracy improvement over state-of-the-arts on graph OOD generalization benchmarks.
arXiv Detail & Related papers (2024-02-18T07:49:22Z) - Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data [42.870635753205185]
Neural networks trained by gradient descent (GD) have exhibited a number of surprising generalization behaviors.
We show that both of these phenomena provably occur in two-layer ReLU networks trained by GD on XOR cluster data.
At a later training step, the network achieves near-optimal test accuracy while still fitting the random labels in the training data, exhibiting a "grokking" phenomenon.
arXiv Detail & Related papers (2023-10-04T02:50:34Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Adversarial Noises Are Linearly Separable for (Nearly) Random Neural
Networks [46.13404040937189]
Adversarial examples, which are usually generated for specific inputs with a specific model, are ubiquitous for neural networks.
In this paper we unveil a surprising property of adversarial noises when they are put together, i.e., adversarial noises crafted by one-step methods are linearly separable if equipped with the corresponding labels.
arXiv Detail & Related papers (2022-06-09T07:26:46Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Benign Overfitting without Linearity: Neural Network Classifiers Trained
by Gradient Descent for Noisy Linear Data [44.431266188350655]
We consider the generalization error of two-layer neural networks trained to generalize by gradient descent.
We show that neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error.
In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.
arXiv Detail & Related papers (2022-02-11T23:04:00Z) - Self-Ensembling GAN for Cross-Domain Semantic Segmentation [107.27377745720243]
This paper proposes a self-ensembling generative adversarial network (SE-GAN) exploiting cross-domain data for semantic segmentation.
In SE-GAN, a teacher network and a student network constitute a self-ensembling model for generating semantic segmentation maps, which together with a discriminator, forms a GAN.
Despite its simplicity, we find SE-GAN can significantly boost the performance of adversarial training and enhance the stability of the model.
arXiv Detail & Related papers (2021-12-15T09:50:25Z) - Fine-grained Data Distribution Alignment for Post-Training Quantization [100.82928284439271]
We propose a fine-grained data distribution alignment (FDDA) method to boost the performance of post-training quantization.
Our method shows the state-of-the-art performance on ImageNet, especially when the first and last layers are quantized to low-bit.
arXiv Detail & Related papers (2021-09-09T11:45:52Z) - Pattern Detection in the Activation Space for Identifying Synthesized
Content [8.365235325634876]
Generative Adversarial Networks (GANs) have recently achieved unprecedented success in photo-realistic image synthesis from low-dimensional random noise.
The ability to synthesize high-quality content at a large scale brings potential risks as the generated samples may lead to misinformation that can create severe social, political, health, and business hazards.
We propose SubsetGAN to identify generated content by detecting a subset of anomalous node-activations in the inner layers of pre-trained neural networks.
arXiv Detail & Related papers (2021-05-26T11:28:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.