Composition-Incremental Learning for Compositional Generalization
- URL: http://arxiv.org/abs/2511.09082v1
- Date: Thu, 13 Nov 2025 01:30:53 GMT
- Title: Composition-Incremental Learning for Compositional Generalization
- Authors: Zhen Li, Yuwei Wu, Chenchen Jing, Che Sun, Chuanhao Li, Yunde Jia,
- Abstract summary: An ideal model is supposed to gradually improve the capability of compositional generalization in an incremental manner.<n>We develop a benchmark construction pipeline leveraging existing datasets, yielding MIT-States-CompIL and C-GQA-CompIL.<n>We propose a pseudo-replay framework utilizing a visual synthesizer to synthesize visual representations of learned compositions and a linguistic primitive distillation mechanism to maintain aligned primitive representations.
- Score: 35.44592461934844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compositional generalization has achieved substantial progress in computer vision on pre-collected training data. Nonetheless, real-world data continually emerges, with possible compositions being nearly infinite, long-tailed, and not entirely visible. Thus, an ideal model is supposed to gradually improve the capability of compositional generalization in an incremental manner. In this paper, we explore Composition-Incremental Learning for Compositional Generalization (CompIL) in the context of the compositional zero-shot learning (CZSL) task, where models need to continually learn new compositions, intending to improve their compositional generalization capability progressively. To quantitatively evaluate CompIL, we develop a benchmark construction pipeline leveraging existing datasets, yielding MIT-States-CompIL and C-GQA-CompIL. Furthermore, we propose a pseudo-replay framework utilizing a visual synthesizer to synthesize visual representations of learned compositions and a linguistic primitive distillation mechanism to maintain aligned primitive representations across the learning process. Extensive experiments demonstrate the effectiveness of the proposed framework.
Related papers
- Communication-Inspired Tokenization for Structured Image Representations [74.17163003465537]
COMmunication inspired Tokenization (COMiT) is a framework for learning structured discrete visual token sequences.<n>Our experiments demonstrate that while semantic alignment provides grounding, attentive sequential tokenization is critical for inducing interpretable, object-centric token structure.
arXiv Detail & Related papers (2026-02-24T09:53:50Z) - Does Data Scaling Lead to Visual Compositional Generalization? [21.242714408660508]
We find that compositional generalization is driven by data diversity, not mere data scale.<n>We prove this structure is key to efficiency, enabling perfect generalization from few observed combinations.
arXiv Detail & Related papers (2025-07-09T17:59:03Z) - Learning to Substitute Components for Compositional Generalization [70.96410435337967]
We propose a novel compositional augmentation strategy called CompSub, which enables multi-grained composition of substantial substructures.<n>We also introduce the Learning Component Substitution (LCS) framework, which empowers the learning of component substitution probabilities in CompSub.<n>Our results demonstrate the superiority of CompSub, LCS, and LCS-ICL, with improvements of up to 66.5%, 10.3%, 1.4%, and 8.8%, respectively.
arXiv Detail & Related papers (2025-02-28T08:30:47Z) - Compositional Generalization from First Principles [27.243195680442533]
We investigate compositionality as a property of the data-generating process rather than the data itself.
This reformulation enables us to derive mild conditions on only the support of the training distribution and the model architecture.
Our results set the stage for a principled theoretical study of compositional generalization.
arXiv Detail & Related papers (2023-07-10T19:30:32Z) - Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal
Structured Representations [70.41385310930846]
We present an end-to-end framework Structure-CLIP to enhance multi-modal structured representations.
We use scene graphs to guide the construction of semantic negative examples, which results in an increased emphasis on learning structured representations.
A Knowledge-Enhance (KEE) is proposed to leverage SGK as input to further enhance structured representations.
arXiv Detail & Related papers (2023-05-06T03:57:05Z) - Compositional Generalization Requires Compositional Parsers [69.77216620997305]
We compare sequence-to-sequence models and models guided by compositional principles on the recent COGS corpus.
We show structural generalization is a key measure of compositional generalization and requires models that are aware of complex structure.
arXiv Detail & Related papers (2022-02-24T07:36:35Z) - Visually Grounded Concept Composition [31.981204314287282]
We learn the grounding of both primitive and all composed concepts by aligning them to images.
We show that learning to compose leads to more robust grounding results, measured in text-to-image matching accuracy.
arXiv Detail & Related papers (2021-09-29T00:38:58Z) - Improving Compositional Generalization in Semantic Parsing [54.4720965813889]
Generalization of models to out-of-distribution (OOD) data has captured tremendous attention recently.
We investigate compositional generalization in semantic parsing, a natural test-bed for compositional generalization.
arXiv Detail & Related papers (2020-10-12T12:34:58Z) - Compositional Generalization in Semantic Parsing: Pre-training vs.
Specialized Architectures [1.8434042562191812]
We show that pre-training leads to significant improvements in performance vs. comparable non-pre-trained models.
We establish a new state of the art on the CFQ compositional generalization benchmark using pre-training together with an intermediate representation.
arXiv Detail & Related papers (2020-07-17T13:34:49Z) - Compositional Generalization by Learning Analytical Expressions [87.15737632096378]
A memory-augmented neural model is connected with analytical expressions to achieve compositional generalization.
Experiments on the well-known benchmark SCAN demonstrate that our model seizes a great ability of compositional generalization.
arXiv Detail & Related papers (2020-06-18T15:50:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.