Zero-Shot Compositional Concept Learning
- URL: http://arxiv.org/abs/2107.05176v1
- Date: Mon, 12 Jul 2021 03:31:56 GMT
- Title: Zero-Shot Compositional Concept Learning
- Authors: Guangyue Xu, Parisa Kordjamshidi, Joyce Y. Chai
- Abstract summary: We propose an episode-based cross-attention (EpiCA) network which combines merits of cross-attention mechanism and episode-based training strategy.
EpiCA bases on cross-attention to correlate concept-visual information and utilizes the gated pooling layer to build contextualized representations for both images and concepts.
Experiments on two widely-used zero-shot compositional learning (ZSCL) benchmarks have demonstrated the effectiveness of the model.
- Score: 10.108857371774977
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study the problem of recognizing compositional
attribute-object concepts within the zero-shot learning (ZSL) framework. We
propose an episode-based cross-attention (EpiCA) network which combines merits
of cross-attention mechanism and episode-based training strategy to recognize
novel compositional concepts. Firstly, EpiCA bases on cross-attention to
correlate concept-visual information and utilizes the gated pooling layer to
build contextualized representations for both images and concepts. The updated
representations are used for a more in-depth multi-modal relevance calculation
for concept recognition. Secondly, a two-phase episode training strategy,
especially the transductive phase, is adopted to utilize unlabeled test
examples to alleviate the low-resource learning problem. Experiments on two
widely-used zero-shot compositional learning (ZSCL) benchmarks have
demonstrated the effectiveness of the model compared with recent approaches on
both conventional and generalized ZSCL settings.
Related papers
- Contextual Interaction via Primitive-based Adversarial Training For Compositional Zero-shot Learning [23.757252768668497]
Compositional Zero-shot Learning (CZSL) aims to identify novel compositions via known attribute-object pairs.
The primary challenge in CZSL tasks lies in the significant discrepancies introduced by the complex interaction between the visual primitives of attribute and object.
We propose a model-agnostic and Primitive-Based Adversarial training (PBadv) method to deal with this problem.
arXiv Detail & Related papers (2024-06-21T08:18:30Z) - Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based
Image Retrieval [69.46139774646308]
This paper studies the problem of zero-shot sketch-based image retrieval (ZS-SBIR)
It aims to use sketches from unseen categories as queries to match the images of the same category.
We propose a novel Symmetrical Bidirectional Knowledge Alignment for zero-shot sketch-based image retrieval (SBKA)
arXiv Detail & Related papers (2023-12-16T04:50:34Z) - Mutual Balancing in State-Object Components for Compositional Zero-Shot
Learning [0.0]
Compositional Zero-Shot Learning (CZSL) aims to recognize unseen compositions from seen states and objects.
We propose a novel method called MUtual balancing in STate-object components (MUST) for CZSL, which provides a balancing inductive bias for the model.
Our approach significantly outperforms the state-of-the-art on MIT-States, UT-Zappos, and C-GQA when combined with the basic CZSL frameworks.
arXiv Detail & Related papers (2022-11-19T10:21:22Z) - DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning [37.48292304239107]
We present a transformer-based end-to-end ZSL method named DUET.
We develop a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images.
We find that DUET can often achieve state-of-the-art performance, its components are effective and its predictions are interpretable.
arXiv Detail & Related papers (2022-07-04T11:12:12Z) - Siamese Contrastive Embedding Network for Compositional Zero-Shot
Learning [76.13542095170911]
Compositional Zero-Shot Learning (CZSL) aims to recognize unseen compositions formed from seen state and object during training.
We propose a novel Siamese Contrastive Embedding Network (SCEN) for unseen composition recognition.
Our method significantly outperforms the state-of-the-art approaches on three challenging benchmark datasets.
arXiv Detail & Related papers (2022-06-29T09:02:35Z) - Impact of a DCT-driven Loss in Attention-based Knowledge-Distillation
for Scene Recognition [64.29650787243443]
We propose and analyse the use of a 2D frequency transform of the activation maps before transferring them.
This strategy enhances knowledge transferability in tasks such as scene recognition.
We publicly release the training and evaluation framework used along this paper at http://www.vpu.eps.uam.es/publications/DCTBasedKDForSceneRecognition.
arXiv Detail & Related papers (2022-05-04T11:05:18Z) - Cross-modal Representation Learning for Zero-shot Action Recognition [67.57406812235767]
We present a cross-modal Transformer-based framework, which jointly encodes video data and text labels for zero-shot action recognition (ZSAR)
Our model employs a conceptually new pipeline by which visual representations are learned in conjunction with visual-semantic associations in an end-to-end manner.
Experiment results show our model considerably improves upon the state of the arts in ZSAR, reaching encouraging top-1 accuracy on UCF101, HMDB51, and ActivityNet benchmark datasets.
arXiv Detail & Related papers (2022-05-03T17:39:27Z) - Semantic Representation and Dependency Learning for Multi-Label Image
Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category.
Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model.
We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z) - Dual Contrastive Learning for General Face Forgery Detection [64.41970626226221]
We propose a novel face forgery detection framework, named Dual Contrastive Learning (DCL), which constructs positive and negative paired data.
To explore the essential discrepancies, Intra-Instance Contrastive Learning (Intra-ICL) is introduced to focus on the local content inconsistencies prevalent in the forged faces.
arXiv Detail & Related papers (2021-12-27T05:44:40Z) - Ontology-based n-ball Concept Embeddings Informing Few-shot Image
Classification [5.247029505708008]
ViOCE integrates symbolic knowledge in the form of $n$-ball concept embeddings into a neural network based vision architecture.
We evaluate ViOCE using the task of few-shot image classification, where it demonstrates superior performance on two standard benchmarks.
arXiv Detail & Related papers (2021-09-19T05:35:43Z) - Cross Knowledge-based Generative Zero-Shot Learning Approach with
Taxonomy Regularization [5.280368849852332]
We develop a generative network-based ZSL approach equipped with the proposed Cross Knowledge Learning (CKL) scheme and Taxonomy Regularization (TR)
CKL enables more relevant semantic features to be trained for semantic-to-visual feature embedding in ZSL.
TR significantly improves the intersections with unseen images with more generalized visual features generated from generative network.
arXiv Detail & Related papers (2021-01-25T04:38:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.