Enabling Systematic Generalization in Abstract Spatial Reasoning through Meta-Learning for Compositionality
- URL: http://arxiv.org/abs/2504.01445v1
- Date: Wed, 02 Apr 2025 07:56:39 GMT
- Title: Enabling Systematic Generalization in Abstract Spatial Reasoning through Meta-Learning for Compositionality
- Authors: Philipp Mondorf, Shijia Zhou, Monica Riedler, Barbara Plank,
- Abstract summary: We extend the approach of meta-learning for compositionality to the domain of abstract spatial reasoning.<n>Our results show that a transformer-based encoder-decoder model, trained via meta-learning for compositionality, can systematically generalize to previously unseen transformation compositions.
- Score: 20.958479821810762
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Systematic generalization refers to the capacity to understand and generate novel combinations from known components. Despite recent progress by large language models (LLMs) across various domains, these models often fail to extend their knowledge to novel compositional scenarios, revealing notable limitations in systematic generalization. There has been an ongoing debate about whether neural networks possess the capacity for systematic generalization, with recent studies suggesting that meta-learning approaches designed for compositionality can significantly enhance this ability. However, these insights have largely been confined to linguistic problems, leaving their applicability to other tasks an open question. In this study, we extend the approach of meta-learning for compositionality to the domain of abstract spatial reasoning. To this end, we introduce $\textit{SYGAR}$-a dataset designed to evaluate the capacity of models to systematically generalize from known geometric transformations (e.g., translation, rotation) of two-dimensional objects to novel combinations of these transformations (e.g., translation+rotation). Our results show that a transformer-based encoder-decoder model, trained via meta-learning for compositionality, can systematically generalize to previously unseen transformation compositions, significantly outperforming state-of-the-art LLMs, including o3-mini, GPT-4o, and Gemini 2.0 Flash, which fail to exhibit similar systematic behavior. Our findings highlight the effectiveness of meta-learning in promoting systematicity beyond linguistic tasks, suggesting a promising direction toward more robust and generalizable models.
Related papers
- Learning to Substitute Components for Compositional Generalization [70.96410435337967]
We propose a novel compositional augmentation strategy called CompSub, which enables multi-grained composition of substantial substructures.<n>We also introduce the Learning Component Substitution (LCS) framework, which empowers the learning of component substitution probabilities in CompSub.<n>Our results demonstrate the superiority of CompSub, LCS, and LCS-ICL, with improvements of up to 66.5%, 10.3%, 1.4%, and 8.8%, respectively.
arXiv Detail & Related papers (2025-02-28T08:30:47Z) - Towards Sharper Information-theoretic Generalization Bounds for Meta-Learning [31.843499167305115]
We establish novel single-step information-theoretic bounds for meta-learning.
Our bounds exhibit substantial advantages over prior MI- and CMI-based bounds.
We provide novel theoretical insights into the generalization behavior of two classes of noise and iterative meta-learning algorithms.
arXiv Detail & Related papers (2025-01-26T15:22:04Z) - Interpreting token compositionality in LLMs: A robustness analysis [10.777646083061395]
Constituent-Aware Pooling (CAP) is a methodology designed to analyse how large language models process linguistic structures.<n>CAP intervenes in model activations through constituent-based pooling at various model levels.<n>Our findings highlight fundamental limitations in current transformer architectures regarding compositional semantics processing and model interpretability.
arXiv Detail & Related papers (2024-10-16T18:10:50Z) - Enhanced Transformer architecture for in-context learning of dynamical systems [0.3749861135832073]
In this paper, we enhance the original meta-modeling framework through three key innovations.
The efficacy of these modifications is demonstrated through a numerical example focusing on the Wiener-Hammerstein system class.
arXiv Detail & Related papers (2024-10-04T10:05:15Z) - Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML)
This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature.
The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z) - Coding for Intelligence from the Perspective of Category [66.14012258680992]
Coding targets compressing and reconstructing data, and intelligence.
Recent trends demonstrate the potential homogeneity of these two fields.
We propose a novel problem of Coding for Intelligence from the category theory view.
arXiv Detail & Related papers (2024-07-01T07:05:44Z) - Limits of Transformer Language Models on Learning to Compose Algorithms [77.2443883991608]
We evaluate training LLaMA models and prompting GPT-4 and Gemini on four tasks demanding to learn a composition of several discrete sub-tasks.
Our results indicate that compositional learning in state-of-the-art Transformer language models is highly sample inefficient.
arXiv Detail & Related papers (2024-02-08T16:23:29Z) - Advancing Graph Representation Learning with Large Language Models: A
Comprehensive Survey of Techniques [37.60727548905253]
The integration of Large Language Models (LLMs) with Graph Representation Learning (GRL) marks a significant evolution in analyzing complex data structures.
This collaboration harnesses the sophisticated linguistic capabilities of LLMs to improve the contextual understanding and adaptability of graph models.
Despite a growing body of research dedicated to integrating LLMs into the graph domain, a comprehensive review that deeply analyzes the core components and operations is notably lacking.
arXiv Detail & Related papers (2024-02-04T05:51:14Z) - Contextualization Distillation from Large Language Model for Knowledge
Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks.
Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments.
Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z) - Concept Learners for Few-Shot Learning [76.08585517480807]
We propose COMET, a meta-learning method that improves generalization ability by learning to learn along human-interpretable concept dimensions.
We evaluate our model on few-shot tasks from diverse domains, including fine-grained image classification, document categorization and cell type annotation.
arXiv Detail & Related papers (2020-07-14T22:04:17Z) - Neural Entity Linking: A Survey of Models Based on Deep Learning [82.43751915717225]
This survey presents a comprehensive description of recent neural entity linking (EL) systems developed since 2015.
Its goal is to systemize design features of neural entity linking systems and compare their performance to the remarkable classic methods on common benchmarks.
The survey touches on applications of entity linking, focusing on the recently emerged use-case of enhancing deep pre-trained masked language models.
arXiv Detail & Related papers (2020-05-31T18:02:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.