Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize
- URL: http://arxiv.org/abs/2406.03345v2
- Date: Thu, 6 Jun 2024 09:45:59 GMT
- Title: Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize
- Authors: Tianren Zhang, Chujie Zhao, Guanyu Chen, Yizhou Jiang, Feng Chen,
- Abstract summary: Learning representations that generalize under distribution shifts is critical for building robust machine learning models.
We show that even allowing a neural network to explicitly fit the representations obtained from a teacher network that can generalize out-of-distribution is insufficient for the generalization of the student network.
- Score: 5.642322814965062
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning representations that generalize under distribution shifts is critical for building robust machine learning models. However, despite significant efforts in recent years, algorithmic advances in this direction have been limited. In this work, we seek to understand the fundamental difficulty of out-of-distribution generalization with deep neural networks. We first empirically show that perhaps surprisingly, even allowing a neural network to explicitly fit the representations obtained from a teacher network that can generalize out-of-distribution is insufficient for the generalization of the student network. Then, by a theoretical study of two-layer ReLU networks optimized by stochastic gradient descent (SGD) under a structured feature model, we identify a fundamental yet unexplored feature learning proclivity of neural networks, feature contamination: neural networks can learn uncorrelated features together with predictive features, resulting in generalization failure under distribution shifts. Notably, this mechanism essentially differs from the prevailing narrative in the literature that attributes the generalization failure to spurious correlations. Overall, our results offer new insights into the non-linear feature learning dynamics of neural networks and highlight the necessity of considering inductive biases in out-of-distribution generalization.
Related papers
- Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - How You Start Matters for Generalization [26.74340246715699]
We show that the generalization of neural networks is heavily tied to their initializes.
We make a case against the controversial flat-minima conjecture.
arXiv Detail & Related papers (2022-06-17T05:30:56Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - The Implicit Bias of Gradient Descent on Generalized Gated Linear
Networks [3.3946853660795893]
We derive the infinite-time training limit of a mathematically tractable class of deep nonlinear neural networks (GLNs)
We show how architectural constraints and the implicit bias of gradient descent affect performance.
By making the inductive bias explicit, our framework is poised to inform the development of more efficient, biologically plausible, and robust learning algorithms.
arXiv Detail & Related papers (2022-02-05T22:37:39Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - Learning Theory Can (Sometimes) Explain Generalisation in Graph Neural
Networks [13.518582483147325]
We provide a rigorous analysis of the performance of neural networks in the context of transductive inference.
We show that transductive Rademacher complexity can explain the generalisation properties of graph convolutional networks for block models.
arXiv Detail & Related papers (2021-12-07T20:06:23Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Formalizing Generalization and Robustness of Neural Networks to Weight
Perturbations [58.731070632586594]
We provide the first formal analysis for feed-forward neural networks with non-negative monotone activation functions against weight perturbations.
We also design a new theory-driven loss function for training generalizable and robust neural networks against weight perturbations.
arXiv Detail & Related papers (2021-03-03T06:17:03Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.