What Do Neural Networks Learn When Trained With Random Labels?
- URL: http://arxiv.org/abs/2006.10455v2
- Date: Wed, 11 Nov 2020 16:12:56 GMT
- Title: What Do Neural Networks Learn When Trained With Random Labels?
- Authors: Hartmut Maennel and Ibrahim Alabdulmohsin and Ilya Tolstikhin and
Robert J. N. Baldock and Olivier Bousquet and Sylvain Gelly and Daniel
Keysers
- Abstract summary: We study deep neural networks (DNNs) trained on natural image data with entirely random labels.
We show analytically for convolutional and fully connected networks that an alignment between the principal components of network parameters and data takes place when training with random labels.
We show how this alignment produces a positive transfer: networks pre-trained with random labels train faster downstream compared to training from scratch.
- Score: 20.54410239839646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study deep neural networks (DNNs) trained on natural image data with
entirely random labels. Despite its popularity in the literature, where it is
often used to study memorization, generalization, and other phenomena, little
is known about what DNNs learn in this setting. In this paper, we show
analytically for convolutional and fully connected networks that an alignment
between the principal components of network parameters and data takes place
when training with random labels. We study this alignment effect by
investigating neural networks pre-trained on randomly labelled image data and
subsequently fine-tuned on disjoint datasets with random or real labels. We
show how this alignment produces a positive transfer: networks pre-trained with
random labels train faster downstream compared to training from scratch even
after accounting for simple effects, such as weight scaling. We analyze how
competing effects, such as specialization at later layers, may hide the
positive transfer. These effects are studied in several network architectures,
including VGG16 and ResNet18, on CIFAR10 and ImageNet.
Related papers
- Training Convolutional Neural Networks with the Forward-Forward
algorithm [1.74440662023704]
Forward Forward (FF) algorithm has up to now only been used in fully connected networks.
We show how the FF paradigm can be extended to CNNs.
Our FF-trained CNN, featuring a novel spatially-extended labeling technique, achieves a classification accuracy of 99.16% on the MNIST hand-written digits dataset.
arXiv Detail & Related papers (2023-12-22T18:56:35Z) - Diffused Redundancy in Pre-trained Representations [98.55546694886819]
We take a closer look at how features are encoded in pre-trained representations.
We find that learned representations in a given layer exhibit a degree of diffuse redundancy.
Our findings shed light on the nature of representations learned by pre-trained deep neural networks.
arXiv Detail & Related papers (2023-05-31T21:00:50Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Reconstructing Training Data from Trained Neural Networks [42.60217236418818]
We show in some cases a significant fraction of the training data can in fact be reconstructed from the parameters of a trained neural network classifier.
We propose a novel reconstruction scheme that stems from recent theoretical results about the implicit bias in training neural networks with gradient-based methods.
arXiv Detail & Related papers (2022-06-15T18:35:16Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Background Invariant Classification on Infrared Imagery by Data
Efficient Training and Reducing Bias in CNNs [1.2891210250935146]
convolutional neural networks can classify objects in images very accurately.
It is well known that the attention of the network may not always be on the semantically important regions of the scene.
We propose a new two-step training procedure called textitsplit training to reduce this bias in CNNs on both Infrared imagery and RGB data.
arXiv Detail & Related papers (2022-01-22T23:29:42Z) - Understanding Feature Transfer Through Representation Alignment [45.35473578109525]
We find that training neural networks with different architectures and generalizations on random or true labels enforces the same relationship between the hidden representations and the training labels.
We show in a classic synthetic transfer problem that alignment is the determining factor for positive and negative transfer to similar and dissimilar tasks.
arXiv Detail & Related papers (2021-12-15T00:20:29Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - Variational models for signal processing with Graph Neural Networks [3.5939555573102853]
This paper is devoted to signal processing on point-clouds by means of neural networks.
In this work, we investigate the use of variational models for such Graph Neural Networks to process signals on graphs for unsupervised learning.
arXiv Detail & Related papers (2021-03-30T13:31:11Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z) - Analyzing Neural Networks Based on Random Graphs [77.34726150561087]
We perform a massive evaluation of neural networks with architectures corresponding to random graphs of various types.
We find that none of the classical numerical graph invariants by itself allows to single out the best networks.
We also find that networks with primarily short-range connections perform better than networks which allow for many long-range connections.
arXiv Detail & Related papers (2020-02-19T11:04:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.