Behavioural vs. Representational Systematicity in End-to-End Models: An Opinionated Survey
- URL: http://arxiv.org/abs/2506.04461v1
- Date: Wed, 04 Jun 2025 21:22:38 GMT
- Title: Behavioural vs. Representational Systematicity in End-to-End Models: An Opinionated Survey
- Authors: Ivan Vegner, Sydelle de Souza, Valentin Forch, Martha Lewis, Leonidas A. A. Doumas,
- Abstract summary: A core aspect of compositionality, systematicity is a desirable property in ML models.<n>Existing benchmarks and models primarily focus on the systematicity of behaviour.<n>Building on Hadley's taxonomy of systematic generalization, we analyze the extent to which behavioural systematicity is tested.
- Score: 0.9218181299449681
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A core aspect of compositionality, systematicity is a desirable property in ML models as it enables strong generalization to novel contexts. This has led to numerous studies proposing benchmarks to assess systematic generalization, as well as models and training regimes designed to enhance it. Many of these efforts are framed as addressing the challenge posed by Fodor and Pylyshyn. However, while they argue for systematicity of representations, existing benchmarks and models primarily focus on the systematicity of behaviour. We emphasize the crucial nature of this distinction. Furthermore, building on Hadley's (1994) taxonomy of systematic generalization, we analyze the extent to which behavioural systematicity is tested by key benchmarks in the literature across language and vision. Finally, we highlight ways of assessing systematicity of representations in ML models as practiced in the field of mechanistic interpretability.
Related papers
- Generalizing vision-language models to novel domains: A comprehensive survey [55.97518817219619]
Vision-language pretraining has emerged as a transformative technique that integrates the strengths of both visual and textual modalities.<n>This survey aims to comprehensively summarize the generalization settings, methodologies, benchmarking and results in VLM literatures.
arXiv Detail & Related papers (2025-06-23T10:56:37Z) - Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.<n>Models may behave unreliably due to poorly explored failure modes.<n> causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z) - On the Reasoning Capacity of AI Models and How to Quantify It [0.0]
Large Language Models (LLMs) have intensified the debate surrounding the fundamental nature of their reasoning capabilities.<n>While achieving high performance on benchmarks such as GPQA and MMLU, these models exhibit limitations in more complex reasoning tasks.<n>We propose a novel phenomenological approach that goes beyond traditional accuracy metrics to probe the underlying mechanisms of model behavior.
arXiv Detail & Related papers (2025-01-23T16:58:18Z) - Benchmarks as Microscopes: A Call for Model Metrology [76.64402390208576]
Modern language models (LMs) pose a new challenge in capability assessment.
To be confident in our metrics, we need a new discipline of model metrology.
arXiv Detail & Related papers (2024-07-22T17:52:12Z) - Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms [91.19304518033144]
We aim to align vision models with human aesthetic standards in a retrieval system.
We propose a preference-based reinforcement learning method that fine-tunes the vision models to better align the vision models with human aesthetics.
arXiv Detail & Related papers (2024-06-13T17:59:20Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Evaluation Gaps in Machine Learning Practice [13.963766987258161]
In practice, evaluations of machine learning models frequently focus on a narrow range of decontextualized predictive behaviours.
We examine the evaluation gaps between the idealized breadth of evaluation concerns and the observed narrow focus of actual evaluations.
By studying these properties, we demonstrate the machine learning discipline's implicit assumption of a range of commitments which have normative impacts.
arXiv Detail & Related papers (2022-05-11T04:00:44Z) - Towards Out-Of-Distribution Generalization: A Survey [46.329995334444156]
Out-of-Distribution generalization is an emerging topic of machine learning research.
This paper represents the first comprehensive, systematic review of OOD generalization.
arXiv Detail & Related papers (2021-08-31T05:28:42Z) - Probing Linguistic Systematicity [11.690179162556353]
There is accumulating evidence that neural models often generalize non-systematically.
We identify ways in which network architectures can generalize non-systematically, and discuss why such forms of generalization may be unsatisfying.
arXiv Detail & Related papers (2020-05-08T23:31:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.