Related papers: Understanding Simplicity Bias towards Compositional Mappings via Learning Dynamics

Understanding Simplicity Bias towards Compositional Mappings via Learning Dynamics

URL: http://arxiv.org/abs/2409.09626v1
Date: Sun, 15 Sep 2024 06:37:12 GMT
Title: Understanding Simplicity Bias towards Compositional Mappings via Learning Dynamics
Authors: Yi Ren, Danica J. Sutherland,
Abstract summary: We study the uniqueness of compositional mappings through different perspectives. This property explains why models having such mappings can generalize well. We show that the simplicity bias is usually an intrinsic property of neural network training via gradient descent.
Score: 20.720113883193765
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Obtaining compositional mappings is important for the model to generalize well compositionally. To better understand when and how to encourage the model to learn such mappings, we study their uniqueness through different perspectives. Specifically, we first show that the compositional mappings are the simplest bijections through the lens of coding length (i.e., an upper bound of their Kolmogorov complexity). This property explains why models having such mappings can generalize well. We further show that the simplicity bias is usually an intrinsic property of neural network training via gradient descent. That partially explains why some models spontaneously generalize well when they are trained appropriately.

Related papers

The Universality Lens: Why Even Highly Over-Parametrized Models Learn Well [4.2466572124752995]
We study a Bayesian mixture with log-loss and (almost) uniform prior over an expansive hypothesis class.<n>Key result shows that the learner's regret is not determined by the overall size of the hypothesis class.<n>Results apply broadly across online, batch, and supervised learning settings.
arXiv Detail & Related papers (2025-06-09T11:32:31Z)
Saliency Methods are Encoders: Analysing Logical Relations Towards Interpretation [0.11510009152620666]
Saliency maps are often generated to improve explainability of neural network models. This paper introduces a test for saliency map evaluation: proposing experiments based on all possible model reasonings over simple logical datasets. Using the contained logical relationships, we aim to understand how different saliency methods treat information in different class discriminative scenarios. Our results show that saliency methods can encode classification relevant information into the ordering of saliency scores.
arXiv Detail & Related papers (2024-12-17T08:55:17Z)
What makes Models Compositional? A Theoretical View: With Supplement [60.284698521569936]
We propose a general neuro-symbolic definition of compositional functions and their compositional complexity. We show how various existing general and special purpose sequence processing models fit this definition and use it to analyze their compositional complexity.
arXiv Detail & Related papers (2024-05-02T20:10:27Z)
Towards Understanding the Relationship between In-context Learning and Compositional Generalization [7.843029855730508]
We train a causal Transformer in a setting that renders ordinary learning very difficult. The model can solve the task, however, by utilizing earlier examples to generalize to later ones. In evaluations on the datasets, SCAN, COGS, and GeoQuery, models trained in this manner indeed show improved compositional generalization.
arXiv Detail & Related papers (2024-03-18T14:45:52Z)
Simplicity in Complexity : Explaining Visual Complexity using Deep Segmentation Models [6.324765782436764]
We propose to model complexity using segment-based representations of images. We find that complexity is well-explained by a simple linear model with these two features across six diverse image-sets.
arXiv Detail & Related papers (2024-03-05T17:21:31Z)
Neural Redshift: Random Networks are not Random Functions [28.357640341268745]
We show that NNs do not have an inherent "simplicity bias" Alternative architectures can be built with a bias for any level of complexity. It points to promising avenues for controlling the solutions implemented by trained models.
arXiv Detail & Related papers (2024-03-04T17:33:20Z)
Discovering modular solutions that generalize compositionally [55.46688816816882]
We show that identification up to linear transformation purely from demonstrations is possible without having to learn an exponential number of module combinations. We further demonstrate empirically that meta-learning from finite data can discover modular policies that generalize compositionally in a number of complex environments.
arXiv Detail & Related papers (2023-12-22T16:33:50Z)
Entangled Residual Mappings [59.02488598557491]
We introduce entangled residual mappings to generalize the structure of the residual connections. An entangled residual mapping replaces the identity skip connections with specialized entangled mappings. We show that while entangled mappings can preserve the iterative refinement of features across various deep models, they influence the representation learning process in convolutional networks.
arXiv Detail & Related papers (2022-06-02T19:36:03Z)
Grounded Graph Decoding Improves Compositional Generalization in Question Answering [68.72605660152101]
Question answering models struggle to generalize to novel compositions of training patterns, such as longer sequences or more complex test structures. We propose Grounded Graph Decoding, a method to improve compositional generalization of language representations by grounding structured predictions with an attention mechanism. Our model significantly outperforms state-of-the-art baselines on the Compositional Freebase Questions (CFQ) dataset, a challenging benchmark for compositional generalization in question answering.
arXiv Detail & Related papers (2021-11-05T17:50:14Z)
Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization [93.8373619657239]
Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features. This simplicity bias can explain their lack of robustness out of distribution (OOD) We demonstrate that the simplicity bias can be mitigated and OOD generalization improved.
arXiv Detail & Related papers (2021-05-12T12:12:24Z)
Counterfactual Generative Networks [59.080843365828756]
We propose to decompose the image generation process into independent causal mechanisms that we train without direct supervision. By exploiting appropriate inductive biases, these mechanisms disentangle object shape, object texture, and background. We show that the counterfactual images can improve out-of-distribution with a marginal drop in performance on the original classification task.
arXiv Detail & Related papers (2021-01-15T10:23:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.