MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning
- URL: http://arxiv.org/abs/2406.12757v1
- Date: Tue, 18 Jun 2024 16:24:48 GMT
- Title: MAC: A Benchmark for Multiple Attributes Compositional Zero-Shot Learning
- Authors: Shuo Xu, Sai Wang, Xinyue Hu, Yutian Lin, Bo Du, Yu Wu,
- Abstract summary: Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen neglecting and recognize unseen attribute-object compositions.
We introduce the Multi-Attribute Composition dataset, encompassing 18,217 images and 11,067 compositions with comprehensive, representative, and diverse attribute annotations.
Our dataset supports deeper semantic understanding and higher-order attribute associations, providing a more realistic and challenging benchmark for the CZSL task.
- Score: 33.12021227971062
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen compositions and recognize unseen attribute-object compositions. Existing CZSL datasets focus on single attributes, neglecting the fact that objects naturally exhibit multiple interrelated attributes. Real-world objects often possess multiple interrelated attributes, and current datasets' narrow attribute scope and single attribute labeling introduce annotation biases, undermining model performance and evaluation. To address these limitations, we introduce the Multi-Attribute Composition (MAC) dataset, encompassing 18,217 images and 11,067 compositions with comprehensive, representative, and diverse attribute annotations. MAC includes an average of 30.2 attributes per object and 65.4 objects per attribute, facilitating better multi-attribute composition predictions. Our dataset supports deeper semantic understanding and higher-order attribute associations, providing a more realistic and challenging benchmark for the CZSL task. We also develop solutions for multi-attribute compositional learning and propose the MM-encoder to disentangling the attributes and objects.
Related papers
- Hierarchical Visual Primitive Experts for Compositional Zero-Shot
Learning [52.506434446439776]
Compositional zero-shot learning (CZSL) aims to recognize compositions with prior knowledge of known primitives (attribute and object)
We propose a simple and scalable framework called Composition Transformer (CoT) to address these issues.
Our method achieves SoTA performance on several benchmarks, including MIT-States, C-GQA, and VAW-CZSL.
arXiv Detail & Related papers (2023-08-08T03:24:21Z) - Learning Conditional Attributes for Compositional Zero-Shot Learning [78.24309446833398]
Compositional Zero-Shot Learning (CZSL) aims to train models to recognize novel compositional concepts.
One of the challenges is to model attributes interacted with different objects, e.g., the attribute wet" in wet apple" and wet cat" is different.
We argue that attributes are conditioned on the recognized object and input image and explore learning conditional attribute embeddings.
arXiv Detail & Related papers (2023-05-29T08:04:05Z) - AMELI: Enhancing Multimodal Entity Linking with Fine-Grained Attributes [22.158388220889865]
We propose attribute-aware multimodal entity linking, where the input is a mention described with a text and image.
The goal is to predict the corresponding target entity from a multimodal knowledge base.
To support this research, we construct AMELI, a large-scale dataset consisting of 18,472 reviews and 35,598 products.
arXiv Detail & Related papers (2023-05-24T05:01:48Z) - Label2Label: A Language Modeling Framework for Multi-Attribute Learning [93.68058298766739]
Label2Label is the first attempt for multi-attribute prediction from the perspective of language modeling.
Inspired by the success of pre-training language models in NLP, Label2Label introduces an image-conditioned masked language model.
Our intuition is that the instance-wise attribute relations are well grasped if the neural net can infer the missing attributes based on the context and the remaining attribute hints.
arXiv Detail & Related papers (2022-07-18T15:12:33Z) - Learning to Predict Visual Attributes in the Wild [43.91237738107603]
We introduce a large-scale in-the-wild visual attribute prediction dataset consisting of over 927K attribute annotations for over 260K object instances.
We propose several techniques that systematically tackle these challenges, including a base model that utilizes both low- and high-level CNN features.
Using these techniques, we achieve near 3.7 mAP and 5.7 overall F1 points improvement over the current state of the art.
arXiv Detail & Related papers (2021-06-17T17:58:02Z) - AdaTag: Multi-Attribute Value Extraction from Product Profiles with
Adaptive Decoding [55.89773725577615]
We present AdaTag, which uses adaptive decoding to handle attribute extraction.
Our experiments on a real-world e-Commerce dataset show marked improvements over previous methods.
arXiv Detail & Related papers (2021-06-04T07:54:11Z) - Compositional Fine-Grained Low-Shot Learning [58.53111180904687]
We develop a novel compositional generative model for zero- and few-shot learning to recognize fine-grained classes with a few or no training samples.
We propose a feature composition framework that learns to extract attribute features from training samples and combines them to construct fine-grained features for rare and unseen classes.
arXiv Detail & Related papers (2021-05-21T16:18:24Z) - Learning to Infer Unseen Attribute-Object Compositions [55.58107964602103]
A graph-based model is proposed that can flexibly recognize both single- and multi-attribute-object compositions.
We build a large-scale Multi-Attribute dataset with 116,099 images and 8,030 composition categories.
arXiv Detail & Related papers (2020-10-27T14:57:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.