Related papers: Systematic Generalization in Language Models Scales with Information Entropy

Systematic Generalization in Language Models Scales with Information Entropy

URL: http://arxiv.org/abs/2505.13089v2
Date: Tue, 27 May 2025 05:40:05 GMT
Title: Systematic Generalization in Language Models Scales with Information Entropy
Authors: Sondre Wold, Lucas Georges Gabriel Charpentier, Étienne Simon,
Abstract summary: We show how one aspect of systematic generalization can be described by the entropy of the distribution of component parts in the training data.<n>Our work connects systematic generalization to information efficiency, and our results indicate that success at high entropy can be achieved even without built-in priors.
Score: 0.5461938536945721
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Systematic generalization remains challenging for current language models, which are known to be both sensitive to semantically similar permutations of the input and to struggle with known concepts presented in novel contexts. Although benchmarks exist for assessing compositional behavior, it is unclear how to measure the difficulty of a systematic generalization problem. In this work, we show how one aspect of systematic generalization can be described by the entropy of the distribution of component parts in the training data. We formalize a framework for measuring entropy in a sequence-to-sequence task and find that the performance of popular model architectures scales with the entropy. Our work connects systematic generalization to information efficiency, and our results indicate that success at high entropy can be achieved even without built-in priors, and that success at low entropy can serve as a target for assessing progress towards robust systematic generalization.

Related papers

Behavioural vs. Representational Systematicity in End-to-End Models: An Opinionated Survey [0.9218181299449681]
A core aspect of compositionality, systematicity is a desirable property in ML models.<n>Existing benchmarks and models primarily focus on the systematicity of behaviour.<n>Building on Hadley's taxonomy of systematic generalization, we analyze the extent to which behavioural systematicity is tested.
arXiv Detail & Related papers (2025-06-04T21:22:38Z)
Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.<n>Models may behave unreliably due to poorly explored failure modes.<n> causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
A Hybrid System for Systematic Generalization in Simple Arithmetic Problems [70.91780996370326]
We propose a hybrid system capable of solving arithmetic problems that require compositional and systematic reasoning over sequences of symbols. We show that the proposed system can accurately solve nested arithmetical expressions even when trained only on a subset including the simplest cases.
arXiv Detail & Related papers (2023-06-29T18:35:41Z)
Revisiting the Compositional Generalization Abilities of Neural Sequence Models [23.665350744415004]
We focus on one-shot primitive generalization as introduced by the popular SCAN benchmark. We demonstrate that modifying the training distribution in simple and intuitive ways enables standard seq-to-seq models to achieve near-perfect generalization performance.
arXiv Detail & Related papers (2022-03-14T18:03:21Z)
Does Pre-training Induce Systematic Inference? How Masked Language Models Acquire Commonsense Knowledge [91.15301779076187]
We introduce verbalized knowledge into the minibatches of a BERT model during pre-training and evaluate how well the model generalizes to supported inferences. We find generalization does not improve over the course of pre-training, suggesting that commonsense knowledge is acquired from surface-level, co-occurrence patterns rather than induced, systematic reasoning.
arXiv Detail & Related papers (2021-12-16T03:13:04Z)
Structure-Preserving Learning Using Gaussian Processes and Variational Integrators [62.31425348954686]
We propose the combination of a variational integrator for the nominal dynamics of a mechanical system and learning residual dynamics with Gaussian process regression. We extend our approach to systems with known kinematic constraints and provide formal bounds on the prediction uncertainty.
arXiv Detail & Related papers (2021-12-10T11:09:29Z)
Symbolic Brittleness in Sequence Models: on Systematic Generalization in Symbolic Mathematics [38.62999063710003]
We consider the problem of symbolic mathematical integration, as it requires generalizing systematically beyond the test set. We develop a methodology for evaluating generalization that takes advantage of the problem domain's structure and access to a verifier. We demonstrate challenges in achieving robustness, compositionality, and out-of-distribution generalization, through both carefully constructed manual test suites and a genetic algorithm.
arXiv Detail & Related papers (2021-09-28T18:50:15Z)
Bootstrapping Generalization of Process Models Discovered From Event Data [10.574698833115589]
Generalization seeks to quantify how well a discovered model describes future executions of the system. We employ a bootstrap approach to estimate properties of a population based on a sample. Experiments demonstrate the feasibility of the approach in industrial settings.
arXiv Detail & Related papers (2021-07-08T14:35:56Z)
Probing Linguistic Systematicity [11.690179162556353]
There is accumulating evidence that neural models often generalize non-systematically. We identify ways in which network architectures can generalize non-systematically, and discuss why such forms of generalization may be unsatisfying.
arXiv Detail & Related papers (2020-05-08T23:31:31Z)
Generalized Entropy Regularization or: There's Nothing Special about Label Smoothing [83.78668073898001]
We introduce a family of entropy regularizers, which includes label smoothing as a special case. We find that variance in model performance can be explained largely by the resulting entropy of the model. We advise the use of other entropy regularization methods in its place.
arXiv Detail & Related papers (2020-05-02T12:46:28Z)
On dissipative symplectic integration with applications to gradient-based optimization [77.34726150561087]
We propose a geometric framework in which discretizations can be realized systematically. We show that a generalization of symplectic to nonconservative and in particular dissipative Hamiltonian systems is able to preserve rates of convergence up to a controlled error.
arXiv Detail & Related papers (2020-04-15T00:36:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.