Related papers: Towards a Comparative Framework for Compositional AI Models

Towards a Comparative Framework for Compositional AI Models

URL: http://arxiv.org/abs/2507.02940v1
Date: Fri, 27 Jun 2025 15:59:14 GMT
Title: Towards a Comparative Framework for Compositional AI Models
Authors: Tiffany Duneau,
Abstract summary: We show how models can learn to compositionally generalise using the DisCoCirc framework for natural language processing.<n>We compare both quantum circuit based models, as well as classical neural networks, on a dataset derived from one of the bAbI tasks.<n>Both architectures score within 5% of one another on the productivity and substitutivity tasks, but differ by at least 10% for the systematicity task.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The DisCoCirc framework for natural language processing allows the construction of compositional models of text, by combining units for individual words together according to the grammatical structure of the text. The compositional nature of a model can give rise to two things: compositional generalisation -- the ability of a model to generalise outside its training distribution by learning compositional rules underpinning the entire data distribution -- and compositional interpretability -- making sense of how the model works by inspecting its modular components in isolation, as well as the processes through which these components are combined. We present these notions in a framework-agnostic way using the language of category theory, and adapt a series of tests for compositional generalisation to this setting. Applying this to the DisCoCirc framework, we consider how well a selection of models can learn to compositionally generalise. We compare both quantum circuit based models, as well as classical neural networks, on a dataset derived from one of the bAbI tasks, extended to test a series of aspects of compositionality. Both architectures score within 5% of one another on the productivity and substitutivity tasks, but differ by at least 10% for the systematicity task, and exhibit different trends on the overgeneralisation tasks. Overall, we find the neural models are more prone to overfitting the Train data. Additionally, we demonstrate how to interpret a compositional model on one of the trained models. By considering how the model components interact with one another, we explain how the model behaves.

Related papers

Does Data Scaling Lead to Visual Compositional Generalization? [21.242714408660508]
We find that compositional generalization is driven by data diversity, not mere data scale.<n>We prove this structure is key to efficiency, enabling perfect generalization from few observed combinations.
arXiv Detail & Related papers (2025-07-09T17:59:03Z)
How Compositional Generalization and Creativity Improve as Diffusion Models are Trained [82.08869888944324]
How many samples do generative models need in order to learn composition rules?<n>What signal in the data is exploited to learn those rules?<n>We discuss connections between the hierarchical clustering mechanism we introduce here and the renormalization group in physics.
arXiv Detail & Related papers (2025-02-17T18:06:33Z)
When does compositional structure yield compositional generalization? A kernel theory [0.0]
We present a theory of compositional generalization in kernel models with fixed, compositionally structured representations.<n>We identify novel failure modes in compositional generalization that arise from biases in the training data.<n>This work examines how statistical structure in the training data can affect compositional generalization.
arXiv Detail & Related papers (2024-05-26T00:50:11Z)
What makes Models Compositional? A Theoretical View: With Supplement [60.284698521569936]
We propose a general neuro-symbolic definition of compositional functions and their compositional complexity. We show how various existing general and special purpose sequence processing models fit this definition and use it to analyze their compositional complexity.
arXiv Detail & Related papers (2024-05-02T20:10:27Z)
Compositional diversity in visual concept learning [18.907108368038216]
Humans leverage compositionality to efficiently learn new concepts, understanding how familiar parts can combine together to form novel objects. Here, we study how people classify and generate alien figures'' with rich relational structure. We develop a Bayesian program induction model which searches for the best programs for generating the candidate visual figures.
arXiv Detail & Related papers (2023-05-30T19:30:50Z)
On the Compositional Generalization Gap of In-Context Learning [73.09193595292233]
We look at the gap between the in-distribution (ID) and out-of-distribution (OOD) performance of such models in semantic parsing tasks with in-context learning. We evaluate four model families, OPT, BLOOM, CodeGen and Codex on three semantic parsing datasets.
arXiv Detail & Related papers (2022-11-15T19:56:37Z)
Equi-Tuning: Group Equivariant Fine-Tuning of Pretrained Models [56.88106830869487]
We introduce equi-tuning, a novel fine-tuning method that transforms (potentially non-equivariant) pretrained models into group equivariant models. We provide applications of equi-tuning on three different tasks: image classification, compositional generalization in language, and fairness in natural language generation.
arXiv Detail & Related papers (2022-10-13T08:45:23Z)
Compositional Generalisation with Structured Reordering and Fertility Layers [121.37328648951993]
Seq2seq models have been shown to struggle with compositional generalisation. We present a flexible end-to-end differentiable neural model that composes two structural operations.
arXiv Detail & Related papers (2022-10-06T19:51:31Z)
Language Model Cascades [72.18809575261498]
Repeated interactions at test-time with a single model, or the composition of multiple models together, further expands capabilities. Cases with control flow and dynamic structure require techniques from probabilistic programming. We formalize several existing techniques from this perspective, including scratchpads / chain of thought, verifiers, STaR, selection-inference, and tool use.
arXiv Detail & Related papers (2022-07-21T07:35:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.