Neural Redshift: Random Networks are not Random Functions
- URL: http://arxiv.org/abs/2403.02241v2
- Date: Tue, 5 Mar 2024 11:43:24 GMT
- Title: Neural Redshift: Random Networks are not Random Functions
- Authors: Damien Teney, Armand Nicolicioiu, Valentin Hartmann, Ehsan Abbasnejad
- Abstract summary: We show that NNs do not have an inherent "simplicity bias"
Alternative architectures can be built with a bias for any level of complexity.
It points to promising avenues for controlling the solutions implemented by trained models.
- Score: 28.357640341268745
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Our understanding of the generalization capabilities of neural networks (NNs)
is still incomplete. Prevailing explanations are based on implicit biases of
gradient descent (GD) but they cannot account for the capabilities of models
from gradient-free methods nor the simplicity bias recently observed in
untrained networks. This paper seeks other sources of generalization in NNs.
Findings. To understand the inductive biases provided by architectures
independently from GD, we examine untrained, random-weight networks. Even
simple MLPs show strong inductive biases: uniform sampling in weight space
yields a very biased distribution of functions in terms of complexity. But
unlike common wisdom, NNs do not have an inherent "simplicity bias". This
property depends on components such as ReLUs, residual connections, and layer
normalizations. Alternative architectures can be built with a bias for any
level of complexity. Transformers also inherit all these properties from their
building blocks.
Implications. We provide a fresh explanation for the success of deep learning
independent from gradient-based training. It points at promising avenues for
controlling the solutions implemented by trained models.
Related papers
- Hierarchical Simplicity Bias of Neural Networks [0.0]
We introduce a novel method called imbalanced label coupling to explore and extend this simplicity bias across hierarchical levels.
Our approach demonstrates that trained networks sequentially consider features of increasing complexity based on their correlation with labels in the training set.
arXiv Detail & Related papers (2023-11-05T11:27:03Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Learning an Invertible Output Mapping Can Mitigate Simplicity Bias in
Neural Networks [66.76034024335833]
We investigate why diverse/ complex features are learned by the backbone, and their brittleness is due to the linear classification head relying primarily on the simplest features.
We propose Feature Reconstruction Regularizer (FRR) to ensure that the learned features can be reconstructed back from the logits.
We demonstrate up to 15% gains in OOD accuracy on the recently introduced semi-synthetic datasets with extreme distribution shifts.
arXiv Detail & Related papers (2022-10-04T04:01:15Z) - Discovering Invariant Rationales for Graph Neural Networks [104.61908788639052]
Intrinsic interpretability of graph neural networks (GNNs) is to find a small subset of the input graph's features.
We propose a new strategy of discovering invariant rationale (DIR) to construct intrinsically interpretable GNNs.
arXiv Detail & Related papers (2022-01-30T16:43:40Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Evading the Simplicity Bias: Training a Diverse Set of Models Discovers
Solutions with Superior OOD Generalization [93.8373619657239]
Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features.
This simplicity bias can explain their lack of robustness out of distribution (OOD)
We demonstrate that the simplicity bias can be mitigated and OOD generalization improved.
arXiv Detail & Related papers (2021-05-12T12:12:24Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.