Simplicity Bias in 1-Hidden Layer Neural Networks
- URL: http://arxiv.org/abs/2302.00457v1
- Date: Wed, 1 Feb 2023 14:00:35 GMT
- Title: Simplicity Bias in 1-Hidden Layer Neural Networks
- Authors: Depen Morwani, Jatin Batra, Prateek Jain, Praneeth Netrapalli
- Abstract summary: Recent works have demonstrated that neural networks exhibit extreme simplicity bias(SB)
We define SB as the network essentially being a function of a low dimensional projection of the inputs.
We show that models trained on real datasets such as Imagenette and Waterbirds-Landbirds indeed depend on a low dimensional projection of the inputs.
- Score: 28.755809186616702
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works have demonstrated that neural networks exhibit extreme
simplicity bias(SB). That is, they learn only the simplest features to solve a
task at hand, even in the presence of other, more robust but more complex
features. Due to the lack of a general and rigorous definition of features,
these works showcase SB on semi-synthetic datasets such as Color-MNIST,
MNIST-CIFAR where defining features is relatively easier.
In this work, we rigorously define as well as thoroughly establish SB for one
hidden layer neural networks. More concretely, (i) we define SB as the network
essentially being a function of a low dimensional projection of the inputs (ii)
theoretically, we show that when the data is linearly separable, the network
primarily depends on only the linearly separable ($1$-dimensional) subspace
even in the presence of an arbitrarily large number of other, more complex
features which could have led to a significantly more robust classifier, (iii)
empirically, we show that models trained on real datasets such as Imagenette
and Waterbirds-Landbirds indeed depend on a low dimensional projection of the
inputs, thereby demonstrating SB on these datasets, iv) finally, we present a
natural ensemble approach that encourages diversity in models by training
successive models on features not used by earlier models, and demonstrate that
it yields models that are significantly more robust to Gaussian noise.
Related papers
- SpaceMesh: A Continuous Representation for Learning Manifold Surface Meshes [61.110517195874074]
We present a scheme to directly generate manifold, polygonal meshes of complex connectivity as the output of a neural network.
Our key innovation is to define a continuous latent connectivity space at each mesh, which implies the discrete mesh.
In applications, this approach not only yields high-quality outputs from generative models, but also enables directly learning challenging geometry processing tasks such as mesh repair.
arXiv Detail & Related papers (2024-09-30T17:59:03Z) - Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data [4.14360329494344]
We characterize simplicity bias for general datasets in the context of two-layer neural networks with small weights and trained with gradient flow.
For datasets with an XOR-like pattern, we precisely identify the learned features and demonstrate that simplicity bias intensifies during later training stages.
These results indicate that features learned in the middle stages of training may be more useful for OOD transfer.
arXiv Detail & Related papers (2024-05-27T16:00:45Z) - The Contextual Lasso: Sparse Linear Models via Deep Neural Networks [5.607237982617641]
We develop a new statistical estimator that fits a sparse linear model to the explanatory features such that the sparsity pattern and coefficients vary as a function of the contextual features.
An extensive suite of experiments on real and synthetic data suggests that the learned models, which remain highly transparent, can be sparser than the regular lasso.
arXiv Detail & Related papers (2023-02-02T05:00:29Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Learning an Invertible Output Mapping Can Mitigate Simplicity Bias in
Neural Networks [66.76034024335833]
We investigate why diverse/ complex features are learned by the backbone, and their brittleness is due to the linear classification head relying primarily on the simplest features.
We propose Feature Reconstruction Regularizer (FRR) to ensure that the learned features can be reconstructed back from the logits.
We demonstrate up to 15% gains in OOD accuracy on the recently introduced semi-synthetic datasets with extreme distribution shifts.
arXiv Detail & Related papers (2022-10-04T04:01:15Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z) - Linear Iterative Feature Embedding: An Ensemble Framework for
Interpretable Model [6.383006473302968]
A new ensemble framework for interpretable model called Linear Iterative Feature Embedding (LIFE) has been developed.
LIFE is able to fit a wide single-hidden-layer neural network (NN) accurately with three steps.
LIFE consistently outperforms directly trained single-hidden-layer NNs and also outperforms many other benchmark models.
arXiv Detail & Related papers (2021-03-18T02:01:17Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z) - Learning Bijective Feature Maps for Linear ICA [73.85904548374575]
We show that existing probabilistic deep generative models (DGMs) which are tailor-made for image data, underperform on non-linear ICA tasks.
To address this, we propose a DGM which combines bijective feature maps with a linear ICA model to learn interpretable latent structures for high-dimensional data.
We create models that converge quickly, are easy to train, and achieve better unsupervised latent factor discovery than flow-based models, linear ICA, and Variational Autoencoders on images.
arXiv Detail & Related papers (2020-02-18T17:58:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.