Self-Attention with State-Object Weighted Combination for Compositional Zero Shot Learning
- URL: http://arxiv.org/abs/2512.18969v1
- Date: Mon, 22 Dec 2025 02:30:19 GMT
- Title: Self-Attention with State-Object Weighted Combination for Compositional Zero Shot Learning
- Authors: Cheng-Hong Chang, Pei-Hsuan Tsai,
- Abstract summary: The ability to recognize both the state and object simultaneously remains less common.<n>One approach to address this is by treating state and object as a single category during training.<n>This approach poses challenges in data collection and training since it requires comprehensive data for all possible combinations.<n>We propose SASOW, an enhancement of KG-SP that considers the weighting of states and objects while improving composition recognition accuracy.
- Score: 1.9336815376402718
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object recognition has become prevalent across various industries. However, most existing applications are limited to identifying objects alone, without considering their associated states. The ability to recognize both the state and object simultaneously remains less common. One approach to address this is by treating state and object as a single category during training. However, this approach poses challenges in data collection and training since it requires comprehensive data for all possible combinations. Compositional Zero-shot Learning (CZSL) emerges as a viable solution by treating the state and object as distinct categories during training. CZSL facilitates the identification of novel compositions even in the absence of data for every conceivable combination. The current state-of-the-art method, KG-SP, addresses this issue by training distinct classifiers for states and objects, while leveraging a semantic model to evaluate the plausibility of composed compositions. However, KG-SP's accuracy in state and object recognition can be further improved, and it fails to consider the weighting of states and objects during composition. In this study, we propose SASOW, an enhancement of KG-SP that considers the weighting of states and objects while improving composition recognition accuracy. First, we introduce self-attention mechanisms into the classifiers for states and objects, leading to enhanced accuracy in recognizing both. Additionally, we incorporate the weighting of states and objects during composition to generate more reasonable and accurate compositions. Our validation process involves testing SASOW on three established benchmark datasets. Experimental outcomes affirm when compared against OW-CZSL approach, KG-SP, SASOW showcases improvements of 2.1%, 1.7%, and 0.4% in terms of accuracy for unseen compositions across the MIT-States, UT Zappos, and C-GQA datasets, respectively.
Related papers
- SASA: Semantic-Aware Contrastive Learning Framework with Separated Attention for Triple Classification [0.0]
Triple Classification(TC) aims to determine the validity of triples from Knowledge Graphs.<n>textbfSASA, a novel framework designed to enhance TC models via separated attention mechanism and semantic-aware contrastive learning(CL)<n> Experimental results across two benchmark datasets demonstrate that SASA significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2026-01-19T13:19:00Z) - Learning Primitive Relations for Compositional Zero-Shot Learning [26.35330980336384]
We propose a novel framework, learning primitive relations (LPR), designed to probabilistically capture the relationships between states and objects.<n>LPR considers the dependencies between states and objects, enabling the model to infer the likelihood of unseen compositions.
arXiv Detail & Related papers (2025-01-24T08:10:05Z) - Object-Centric Conformance Alignments with Synchronization (Extended Version) [57.76661079749309]
We present a new formalism that combines the ability of object-centric Petri nets to capture one-to-many relations and the one of Petri nets with identifiers to compare and synchronize objects based on their identity.
We propose a conformance checking approach for such nets based on an encoding in satisfiability modulo theories (SMT)
arXiv Detail & Related papers (2023-12-13T21:53:32Z) - Hierarchical Visual Primitive Experts for Compositional Zero-Shot
Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object)
We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues.
Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z) - Recognizing Unseen States of Unknown Objects by Leveraging Knowledge Graphs [1.182724861452868]
We propose the first Object-agnostic State Classification (OaSC) method that infers the state of a certain object without relying on the knowledge or the estimation of the object class.<n>A series of experiments investigate the performance of the proposed method in various settings.<n>The proposed OaSC method outperforms existing methods in all datasets and benchmarks by a great margin.
arXiv Detail & Related papers (2023-07-22T22:19:11Z) - Generalised Co-Salient Object Detection [50.876864826216924]
We propose a new setting that relaxes an assumption in the conventional Co-Salient Object Detection (CoSOD) setting.
We call this new setting Generalised Co-Salient Object Detection (GCoSOD)
We propose a novel random sampling based Generalised CoSOD Training (GCT) strategy to distill the awareness of inter-image absence of co-salient objects into CoSOD models.
arXiv Detail & Related papers (2022-08-20T12:23:32Z) - Siamese Contrastive Embedding Network for Compositional Zero-Shot
Learning [76.13542095170911]
Compositional Zero-Shot Learning (CZSL) aims to recognize unseen compositions formed from seen state and object during training.
We propose a novel Siamese Contrastive Embedding Network (SCEN) for unseen composition recognition.
Our method significantly outperforms the state-of-the-art approaches on three challenging benchmark datasets.
arXiv Detail & Related papers (2022-06-29T09:02:35Z) - KG-SP: Knowledge Guided Simple Primitives for Open World Compositional
Zero-Shot Learning [52.422873819371276]
The goal of open-world compositional zero-shot learning (OW-CZSL) is to recognize compositions of state and objects in images.
Here, we revisit a simple CZSL baseline and predict the primitives, i.e. states and objects, independently.
We estimate the feasibility of each composition through external knowledge, using this prior to remove unfeasible compositions from the output space.
Our model, Knowledge-Guided Simple Primitives (KG-SP), achieves state of the art in both OW-CZSL and pCZSL.
arXiv Detail & Related papers (2022-05-13T17:18:15Z) - The Overlooked Classifier in Human-Object Interaction Recognition [82.20671129356037]
We encode the semantic correlation among classes into the classification head by initializing the weights with language embeddings of HOIs.
We propose a new loss named LSE-Sign to enhance multi-label learning on a long-tailed dataset.
Our simple yet effective method enables detection-free HOI classification, outperforming the state-of-the-arts that require object detection and human pose by a clear margin.
arXiv Detail & Related papers (2022-03-10T23:35:00Z) - Pairwise Similarity Knowledge Transfer for Weakly Supervised Object
Localization [53.99850033746663]
We study the problem of learning localization model on target classes with weakly supervised image labels.
In this work, we argue that learning only an objectness function is a weak form of knowledge transfer.
Experiments on the COCO and ILSVRC 2013 detection datasets show that the performance of the localization model improves significantly with the inclusion of pairwise similarity function.
arXiv Detail & Related papers (2020-03-18T17:53:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.