Hybrid Discriminative Attribute-Object Embedding Network for Compositional Zero-Shot Learning
- URL: http://arxiv.org/abs/2412.00121v1
- Date: Thu, 28 Nov 2024 09:50:25 GMT
- Title: Hybrid Discriminative Attribute-Object Embedding Network for Compositional Zero-Shot Learning
- Authors: Yang Liu, Xinshuo Wang, Jiale Du, Xinbo Gao, Jungong Han,
- Abstract summary: Hybrid Discriminative Attribute-Object Embedding (HDA-OE) network is proposed to solve the problem of complex interactions between attributes and object visual representations.
To increase the variability of training data, HDA-OE introduces an attribute-driven data synthesis (ADDS) module.
To further improve the discriminative ability of the model, HDA-OE introduces the subclass-driven discriminative embedding (SDDE) module.
The proposed model has been evaluated on three benchmark datasets, and the results verify its effectiveness and reliability.
- Score: 83.10178754323955
- License:
- Abstract: Compositional Zero-Shot Learning (CZSL) recognizes new combinations by learning from known attribute-object pairs. However, the main challenge of this task lies in the complex interactions between attributes and object visual representations, which lead to significant differences in images. In addition, the long-tail label distribution in the real world makes the recognition task more complicated. To address these problems, we propose a novel method, named Hybrid Discriminative Attribute-Object Embedding (HDA-OE) network. To increase the variability of training data, HDA-OE introduces an attribute-driven data synthesis (ADDS) module. ADDS generates new samples with diverse attribute labels by combining multiple attributes of the same object. By expanding the attribute space in the dataset, the model is encouraged to learn and distinguish subtle differences between attributes. To further improve the discriminative ability of the model, HDA-OE introduces the subclass-driven discriminative embedding (SDDE) module, which enhances the subclass discriminative ability of the encoding by embedding subclass information in a fine-grained manner, helping to capture the complex dependencies between attributes and object visual features. The proposed model has been evaluated on three benchmark datasets, and the results verify its effectiveness and reliability.
Related papers
- MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning [33.12021227971062]
Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen neglecting and recognize unseen attribute-object compositions.
We introduce the Multi-Attribute Composition dataset, encompassing 18,217 images and 11,067 compositions with comprehensive, representative, and diverse attribute annotations.
Our dataset supports deeper semantic understanding and higher-order attribute associations, providing a more realistic and challenging benchmark for the CZSL task.
arXiv Detail & Related papers (2024-06-18T16:24:48Z) - High-Discriminative Attribute Feature Learning for Generalized Zero-Shot Learning [54.86882315023791]
We propose an innovative approach called High-Discriminative Attribute Feature Learning for Generalized Zero-Shot Learning (HDAFL)
HDAFL utilizes multiple convolutional kernels to automatically learn discriminative regions highly correlated with attributes in images.
We also introduce a Transformer-based attribute discrimination encoder to enhance the discriminative capability among attributes.
arXiv Detail & Related papers (2024-04-07T13:17:47Z) - Dual Feature Augmentation Network for Generalized Zero-shot Learning [14.410978100610489]
Zero-shot learning (ZSL) aims to infer novel classes without training samples by transferring knowledge from seen classes.
Existing embedding-based approaches for ZSL typically employ attention mechanisms to locate attributes on an image.
We propose a novel Dual Feature Augmentation Network (DFAN), which comprises two feature augmentation modules.
arXiv Detail & Related papers (2023-09-25T02:37:52Z) - Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification [78.52704557647438]
We propose a novel FIne-grained Representation and Recomposition (FIRe$2$) framework to tackle both limitations without any auxiliary annotation or data.
Experiments demonstrate that FIRe$2$ can achieve state-of-the-art performance on five widely-used cloth-changing person Re-ID benchmarks.
arXiv Detail & Related papers (2023-08-21T12:59:48Z) - Hierarchical Visual Primitive Experts for Compositional Zero-Shot
Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object)
We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues.
Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z) - Learning Conditional Attributes for Compositional Zero-Shot Learning [78.24309446833398]
Compositional Zero-Shot Learning (CZSL) aims to train models to recognize novel compositional concepts.
One of the challenges is to model attributes interacted with different objects, e.g., the attribute wet" in wet apple" and wet cat" is different.
We argue that attributes are conditioned on the recognized object and input image and explore learning conditional attribute embeddings.
arXiv Detail & Related papers (2023-05-29T08:04:05Z) - AttributeNet: Attribute Enhanced Vehicle Re-Identification [70.89289512099242]
We introduce AttributeNet (ANet) that jointly extracts identity-relevant features and attribute features.
We enable the interaction by distilling the ReID-helpful attribute feature and adding it into the general ReID feature to increase the discrimination power.
We validate the effectiveness of our framework on three challenging datasets.
arXiv Detail & Related papers (2021-02-07T19:51:02Z) - Hierarchical Feature Embedding for Attribute Recognition [26.79901907956084]
We propose a hierarchical feature embedding framework, which learns a fine-grained feature embedding by combining attribute and ID information.
Experiments show that our method achieves the state-of-the-art results on two pedestrian attribute datasets and a facial attribute dataset.
arXiv Detail & Related papers (2020-05-23T17:52:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.