Related papers: Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild

Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild

URL: http://arxiv.org/abs/2503.10065v1
Date: Thu, 13 Mar 2025 05:28:40 GMT
Title: Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild
Authors: Damien Teney, Liangze Jiang, Florin Gogianu, Ehsan Abbasnejad,
Abstract summary: "Simplicity bias" is widely regarded as key to their success.<n>We introduce a method to meta-learn new activation functions and inductive biases better suited to specific tasks.<n>We show that activation functions can control these inductive biases, but future tailored architectures might provide further benefits.
Score: 30.526310031491633
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural architectures tend to fit their data with relatively simple functions. This "simplicity bias" is widely regarded as key to their success. This paper explores the limits of this principle. Building on recent findings that the simplicity bias stems from ReLU activations [96], we introduce a method to meta-learn new activation functions and inductive biases better suited to specific tasks. Findings: We identify multiple tasks where the simplicity bias is inadequate and ReLUs suboptimal. In these cases, we learn new activation functions that perform better by inducing a prior of higher complexity. Interestingly, these cases correspond to domains where neural networks have historically struggled: tabular data, regression tasks, cases of shortcut learning, and algorithmic grokking tasks. In comparison, the simplicity bias induced by ReLUs proves adequate on image tasks where the best learned activations are nearly identical to ReLUs and GeLUs. Implications: Contrary to popular belief, the simplicity bias of ReLU networks is not universally useful. It is near-optimal for image classification, but other inductive biases are sometimes preferable. We showed that activation functions can control these inductive biases, but future tailored architectures might provide further benefits. Advances are still needed to characterize a model's inductive biases beyond "complexity", and their adequacy with the data.

Related papers

Debiasify: Self-Distillation for Unsupervised Bias Mitigation [19.813054813868476]
Simplicity bias poses a significant challenge in neural networks, often leading models to favor simpler solutions and inadvertently learn decision rules influenced by spurious correlations. We introduce Debiasify, a novel self-distillation approach that requires no prior knowledge about the nature of biases. Our method leverages a new distillation loss to transfer knowledge within the network, from deeper layers containing complex, highly-predictive features to shallower layers with simpler, attribute-conditioned features in an unsupervised manner.
arXiv Detail & Related papers (2024-11-01T16:25:05Z)
On the Inductive Bias of Stacking Towards Improving Reasoning [50.225873619537765]
We propose a variant of gradual stacking called MIDAS that can speed up language model training by up to 40%. MIDAS is not only training-efficient but surprisingly also has an inductive bias towards improving downstream tasks. We conjecture the underlying reason for this inductive bias by exploring the connection of stacking to looped models.
arXiv Detail & Related papers (2024-09-27T17:58:21Z)
Learning an Invertible Output Mapping Can Mitigate Simplicity Bias in Neural Networks [66.76034024335833]
We investigate why diverse/ complex features are learned by the backbone, and their brittleness is due to the linear classification head relying primarily on the simplest features. We propose Feature Reconstruction Regularizer (FRR) to ensure that the learned features can be reconstructed back from the logits. We demonstrate up to 15% gains in OOD accuracy on the recently introduced semi-synthetic datasets with extreme distribution shifts.
arXiv Detail & Related papers (2022-10-04T04:01:15Z)
On Neural Architecture Inductive Biases for Relational Tasks [76.18938462270503]
We introduce a simple architecture based on similarity-distribution scores which we name Compositional Network generalization (CoRelNet) We find that simple architectural choices can outperform existing models in out-of-distribution generalizations.
arXiv Detail & Related papers (2022-06-09T16:24:01Z)
OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses [24.84797949716142]
We propose modifying the network architecture to impose inductive biases that make the network robust to dataset bias. Specifically, we propose OccamNets, which are biased to favor simpler solutions by design. While OccamNets are biased toward simpler hypotheses, they can learn more complex hypotheses if necessary.
arXiv Detail & Related papers (2022-04-05T18:06:49Z)
An Optimal Transport Perspective on Unpaired Image Super-Resolution [97.24140709634203]
Real-world image super-resolution (SR) tasks often do not have paired datasets, which limits the application of supervised techniques.<n>We investigate optimization problems which arise in such models and find two surprising observations.<n>We prove and empirically show that the learned map is biased, i.e., it does not actually transform the distribution of low-resolution images to high-resolution ones.
arXiv Detail & Related papers (2022-02-02T16:21:20Z)
Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization [93.8373619657239]
Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features. This simplicity bias can explain their lack of robustness out of distribution (OOD) We demonstrate that the simplicity bias can be mitigated and OOD generalization improved.
arXiv Detail & Related papers (2021-05-12T12:12:24Z)
The Low-Rank Simplicity Bias in Deep Networks [46.79964271742486]
We make a series of empirical observations that investigate and extend the hypothesis that deep networks are inductively biased to find solutions with lower effective rank embeddings. We show that our claim holds true on finite width linear and non-linear models on practical learning paradigms and show that on natural data, these are often the solutions that generalize well.
arXiv Detail & Related papers (2021-03-18T17:58:02Z)
Adaptive Rational Activations to Boost Deep Reinforcement Learning [68.10769262901003]
We motivate why rationals are suitable for adaptable activation functions and why their inclusion into neural networks is crucial. We demonstrate that equipping popular algorithms with (recurrent-)rational activations leads to consistent improvements on Atari games.
arXiv Detail & Related papers (2021-02-18T14:53:12Z)
Generalized Hindsight for Reinforcement Learning [154.0545226284078]
We argue that low-reward data collected while trying to solve one task provides little to no signal for solving that particular task. We present Generalized Hindsight: an approximate inverse reinforcement learning technique for relabeling behaviors with the right tasks.
arXiv Detail & Related papers (2020-02-26T18:57:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.