Learning to Substitute Components for Compositional Generalization
- URL: http://arxiv.org/abs/2502.20834v1
- Date: Fri, 28 Feb 2025 08:30:47 GMT
- Title: Learning to Substitute Components for Compositional Generalization
- Authors: Zhaoyi Li, Gangwei Jiang, Chenwang Wu, Ying Wei, Defu Lian, Enhong Chen,
- Abstract summary: We propose a novel compositional augmentation strategy called CompSub, which enables multi-grained composition of substantial substructures.<n>We also introduce the Learning Component Substitution (LCS) framework, which empowers the learning of component substitution probabilities in CompSub.<n>Our results demonstrate the superiority of CompSub, LCS, and LCS-ICL, with improvements of up to 66.5%, 10.3%, 1.4%, and 8.8%, respectively.
- Score: 70.96410435337967
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the rising prevalence of neural language models, recent empirical evidence suggests their deficiency in compositional generalization. One of the current de-facto solutions to this problem is compositional data augmentation, which aims to introduce additional compositional inductive bias. However, existing handcrafted augmentation strategies offer limited improvement when systematic generalization of neural language models requires multi-grained compositional bias (i.e., not limited to either lexical or structural biases alone) or when training sentences have an imbalanced difficulty distribution. To address these challenges, we first propose a novel compositional augmentation strategy called Component Substitution (CompSub), which enables multi-grained composition of substantial substructures across the entire training set. Furthermore, we introduce the Learning Component Substitution (LCS) framework. This framework empowers the learning of component substitution probabilities in CompSub in an end-to-end manner by maximizing the loss of neural language models, thereby prioritizing challenging compositions with elusive concepts and novel contexts. We extend the key ideas of CompSub and LCS to the recently emerging in-context learning scenarios of pre-trained large language models (LLMs), proposing the LCS-ICL algorithm to enhance the few-shot compositional generalization of state-of-the-art (SOTA) LLMs. Theoretically, we provide insights into why applying our algorithms to language models can improve compositional generalization performance. Empirically, our results on four standard compositional generalization benchmarks(SCAN, COGS, GeoQuery, and COGS-QL) demonstrate the superiority of CompSub, LCS, and LCS-ICL, with improvements of up to 66.5%, 10.3%, 1.4%, and 8.8%, respectively.
Related papers
- Enabling Systematic Generalization in Abstract Spatial Reasoning through Meta-Learning for Compositionality [20.958479821810762]
We extend the approach of meta-learning for compositionality to the domain of abstract spatial reasoning.
Our results show that a transformer-based encoder-decoder model, trained via meta-learning for compositionality, can systematically generalize to previously unseen transformation compositions.
arXiv Detail & Related papers (2025-04-02T07:56:39Z) - Compositional Generalization in Spoken Language Understanding [58.609624319953156]
We study two types of compositionality: (a) novel slot combination, and (b) length generalization.
We show that our compositional SLU model significantly outperforms state-of-the-art BERT SLU model.
arXiv Detail & Related papers (2023-12-25T21:46:06Z) - Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models [68.18370230899102]
We investigate how to elicit compositional generalization capabilities in large language models (LLMs)
We find that demonstrating both foundational skills and compositional examples grounded in these skills within the same prompt context is crucial.
We show that fine-tuning LLMs with SKiC-style data can elicit zero-shot weak-to-strong generalization.
arXiv Detail & Related papers (2023-08-01T05:54:12Z) - ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis [54.18659323181771]
We characterize several different forms of compositional generalization that are desirable in program synthesis.
We propose ExeDec, a novel decomposition-based strategy that predicts execution subgoals to solve problems step-by-step informed by program execution at each step.
arXiv Detail & Related papers (2023-07-26T01:07:52Z) - Learning to Substitute Spans towards Improving Compositional
Generalization [26.878616721700485]
We propose a novel compositional augmentation strategy dubbed textbfSpan textbfSubstitution (SpanSub)
We then introduce the textbfL earning textbfSubstitute textbfSpan (L2S2) framework which empowers the learning of span substitution probabilities in SpanSub.
Our results on three standard compositional generalization benchmarks, including SCAN, COGS and GeoQuery, demonstrate the superiority of SpanSub, %the learning framework L2S2 and their combination.
arXiv Detail & Related papers (2023-06-05T12:44:18Z) - Prompting Language-Informed Distribution for Compositional Zero-Shot Learning [73.49852821602057]
Compositional zero-shot learning (CZSL) task aims to recognize unseen compositional visual concepts.
We propose a model by prompting the language-informed distribution, aka., PLID, for the task.
Experimental results on MIT-States, UT-Zappos, and C-GQA datasets show the superior performance of the PLID to the prior arts.
arXiv Detail & Related papers (2023-05-23T18:00:22Z) - Compositional Generalization in Unsupervised Compositional
Representation Learning: A Study on Disentanglement and Emergent Language [48.37815764394315]
We study three unsupervised representation learning algorithms on two datasets that allow directly testing compositional generalization.
We find that directly using the bottleneck representation with simple models and few labels may lead to worse generalization than using representations from layers before or after the learned representation itself.
Surprisingly, we find that increasing pressure to produce a disentangled representation produces representations with worse generalization, while representations from EL models show strong compositional generalization.
arXiv Detail & Related papers (2022-10-02T10:35:53Z) - Improving Compositional Generalization with Latent Structure and Data
Augmentation [39.24527889685699]
We present a more powerful data recombination method using a model called Compositional Structure Learner (CSL)
CSL is a generative model with a quasi-synchronous context-free grammar backbone.
This procedure effectively transfers most of CSL's compositional bias to T5 for diagnostic tasks.
arXiv Detail & Related papers (2021-12-14T18:03:28Z) - Compositional Generalization in Semantic Parsing: Pre-training vs.
Specialized Architectures [1.8434042562191812]
We show that pre-training leads to significant improvements in performance vs. comparable non-pre-trained models.
We establish a new state of the art on the CFQ compositional generalization benchmark using pre-training together with an intermediate representation.
arXiv Detail & Related papers (2020-07-17T13:34:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.