Neural Networks Are Implicit Decision Trees: The Hierarchical Simplicity
Bias
- URL: http://arxiv.org/abs/2311.02622v1
- Date: Sun, 5 Nov 2023 11:27:03 GMT
- Title: Neural Networks Are Implicit Decision Trees: The Hierarchical Simplicity
Bias
- Authors: Zhehang Du
- Abstract summary: We introduce a novel approach termed imbalanced label coupling to investigate scenarios where simple and complex features exhibit different levels of predictive power.
The trained networks make predictions in alignment with the ascending complexity of input features according to how they correlate with the label in the training set.
This observation provides direct evidence that the neural network learns core features in the presence of spurious features.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks exhibit simplicity bias; they rely on simpler features while
ignoring equally predictive but more complex features. In this work, we
introduce a novel approach termed imbalanced label coupling to investigate
scenarios where simple and complex features exhibit different levels of
predictive power. In these cases, complex features still contribute to
predictions. The trained networks make predictions in alignment with the
ascending complexity of input features according to how they correlate with the
label in the training set, irrespective of the underlying predictive power. For
instance, even when simple spurious features distort predictions in CIFAR-10,
most cats are predicted to be dogs, and most trucks are predicted to be
automobiles! This observation provides direct evidence that the neural network
learns core features in the presence of spurious features. We empirically show
that last-layer retraining with target data distribution is effective, yet
insufficient to fully recover core features when spurious features are
perfectly correlated with the target labels in our synthetic dataset. We hope
our research contributes to a deeper understanding of the implicit bias of
neural networks.
Related papers
- Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data [4.14360329494344]
We characterize simplicity bias for general datasets in the context of two-layer neural networks with small weights and trained with gradient flow.
For datasets with an XOR-like pattern, we precisely identify the learned features and demonstrate that simplicity bias intensifies during later training stages.
These results indicate that features learned in the middle stages of training may be more useful for OOD transfer.
arXiv Detail & Related papers (2024-05-27T16:00:45Z) - Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs.
We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD.
We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z) - Hybrid machine-learned homogenization: Bayesian data mining and
convolutional neural networks [0.0]
This study aims to improve the machine learned prediction by developing novel feature descriptors.
The iterative development of feature descriptors resulted in 37 novel features, being able to reduce the prediction error by roughly one third.
A combination of the feature based approach and the convolutional neural network leads to a hybrid neural network.
arXiv Detail & Related papers (2023-02-24T09:59:29Z) - Overcoming Simplicity Bias in Deep Networks using a Feature Sieve [5.33024001730262]
We propose a direct, interventional method for addressing simplicity bias in deep networks.
We aim to automatically identify and suppress easily-computable spurious features in lower layers of the network.
We report substantial gains on many real-world debiasing benchmarks.
arXiv Detail & Related papers (2023-01-30T21:11:13Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Discovering Invariant Rationales for Graph Neural Networks [104.61908788639052]
Intrinsic interpretability of graph neural networks (GNNs) is to find a small subset of the input graph's features.
We propose a new strategy of discovering invariant rationale (DIR) to construct intrinsically interpretable GNNs.
arXiv Detail & Related papers (2022-01-30T16:43:40Z) - Learning from Failure: Training Debiased Classifier from Biased
Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge.
We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously.
Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z) - Consistent feature selection for neural networks via Adaptive Group
Lasso [3.42658286826597]
We propose and establish a theoretical guarantee for the use of the adaptive group for selecting important features of neural networks.
Specifically, we show that our feature selection method is consistent for single-output feed-forward neural networks with one hidden layer and hyperbolic tangent activation function.
arXiv Detail & Related papers (2020-05-30T18:50:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.