Related papers: What Do Neural Networks Learn When Trained With Random Labels?

What Do Neural Networks Learn When Trained With Random Labels?

URL: http://arxiv.org/abs/2006.10455v2
Date: Wed, 11 Nov 2020 16:12:56 GMT
Title: What Do Neural Networks Learn When Trained With Random Labels?
Authors: Hartmut Maennel and Ibrahim Alabdulmohsin and Ilya Tolstikhin and Robert J. N. Baldock and Olivier Bousquet and Sylvain Gelly and Daniel Keysers
Abstract summary: We study deep neural networks (DNNs) trained on natural image data with entirely random labels. We show analytically for convolutional and fully connected networks that an alignment between the principal components of network parameters and data takes place when training with random labels. We show how this alignment produces a positive transfer: networks pre-trained with random labels train faster downstream compared to training from scratch.
Score: 20.54410239839646
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study deep neural networks (DNNs) trained on natural image data with entirely random labels. Despite its popularity in the literature, where it is often used to study memorization, generalization, and other phenomena, little is known about what DNNs learn in this setting. In this paper, we show analytically for convolutional and fully connected networks that an alignment between the principal components of network parameters and data takes place when training with random labels. We study this alignment effect by investigating neural networks pre-trained on randomly labelled image data and subsequently fine-tuned on disjoint datasets with random or real labels. We show how this alignment produces a positive transfer: networks pre-trained with random labels train faster downstream compared to training from scratch even after accounting for simple effects, such as weight scaling. We analyze how competing effects, such as specialization at later layers, may hide the positive transfer. These effects are studied in several network architectures, including VGG16 and ResNet18, on CIFAR10 and ImageNet.

Related papers

Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks [25.691362553629588]
We study the rank of convolutional neural networks (CNNs) trained by gradient descent. We prove that CNNs trained by gradient descent can learn the intrinsic dimension of clean images, despite the presence of relatively large background noises.
arXiv Detail & Related papers (2025-04-11T15:29:55Z)
Training Convolutional Neural Networks with the Forward-Forward algorithm [1.74440662023704]
Forward Forward (FF) algorithm has up to now only been used in fully connected networks. We show how the FF paradigm can be extended to CNNs. Our FF-trained CNN, featuring a novel spatially-extended labeling technique, achieves a classification accuracy of 99.16% on the MNIST hand-written digits dataset.
arXiv Detail & Related papers (2023-12-22T18:56:35Z)
Diffused Redundancy in Pre-trained Representations [98.55546694886819]
We take a closer look at how features are encoded in pre-trained representations. We find that learned representations in a given layer exhibit a degree of diffuse redundancy. Our findings shed light on the nature of representations learned by pre-trained deep neural networks.
arXiv Detail & Related papers (2023-05-31T21:00:50Z)
Neural networks trained with SGD learn distributions of increasing complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics. We then exploit higher-order statistics only later during training. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z)
Reconstructing Training Data from Trained Neural Networks [42.60217236418818]
We show in some cases a significant fraction of the training data can in fact be reconstructed from the parameters of a trained neural network classifier. We propose a novel reconstruction scheme that stems from recent theoretical results about the implicit bias in training neural networks with gradient-based methods.
arXiv Detail & Related papers (2022-06-15T18:35:16Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Background Invariant Classification on Infrared Imagery by Data Efficient Training and Reducing Bias in CNNs [1.2891210250935146]
convolutional neural networks can classify objects in images very accurately. It is well known that the attention of the network may not always be on the semantically important regions of the scene. We propose a new two-step training procedure called textitsplit training to reduce this bias in CNNs on both Infrared imagery and RGB data.
arXiv Detail & Related papers (2022-01-22T23:29:42Z)
Understanding Feature Transfer Through Representation Alignment [45.35473578109525]
We find that training neural networks with different architectures and generalizations on random or true labels enforces the same relationship between the hidden representations and the training labels. We show in a classic synthetic transfer problem that alignment is the determining factor for positive and negative transfer to similar and dissimilar tasks.
arXiv Detail & Related papers (2021-12-15T00:20:29Z)
Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks. We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z)
Variational models for signal processing with Graph Neural Networks [3.5939555573102853]
This paper is devoted to signal processing on point-clouds by means of neural networks. In this work, we investigate the use of variational models for such Graph Neural Networks to process signals on graphs for unsupervised learning.
arXiv Detail & Related papers (2021-03-30T13:31:11Z)
Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation. We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters. As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z)
Analyzing Neural Networks Based on Random Graphs [77.34726150561087]
We perform a massive evaluation of neural networks with architectures corresponding to random graphs of various types. We find that none of the classical numerical graph invariants by itself allows to single out the best networks. We also find that networks with primarily short-range connections perform better than networks which allow for many long-range connections.
arXiv Detail & Related papers (2020-02-19T11:04:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.