Semantic-Aware Dual Contrastive Learning for Multi-label Image
Classification
- URL: http://arxiv.org/abs/2307.09715v4
- Date: Mon, 25 Sep 2023 04:25:32 GMT
- Title: Semantic-Aware Dual Contrastive Learning for Multi-label Image
Classification
- Authors: Leilei Ma, Dengdi Sun, Lei Wang, Haifeng Zhao and Bin Luo
- Abstract summary: We propose a novel semantic-aware dual contrastive learning framework that incorporates sample-to-sample contrastive learning.
Specifically, we leverage semantic-aware representation learning to extract category-related local discriminative features.
Our proposed method is effective and outperforms the state-of-the-art methods.
- Score: 8.387933969327852
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extracting image semantics effectively and assigning corresponding labels to
multiple objects or attributes for natural images is challenging due to the
complex scene contents and confusing label dependencies. Recent works have
focused on modeling label relationships with graph and understanding object
regions using class activation maps (CAM). However, these methods ignore the
complex intra- and inter-category relationships among specific semantic
features, and CAM is prone to generate noisy information. To this end, we
propose a novel semantic-aware dual contrastive learning framework that
incorporates sample-to-sample contrastive learning (SSCL) as well as
prototype-to-sample contrastive learning (PSCL). Specifically, we leverage
semantic-aware representation learning to extract category-related local
discriminative features and construct category prototypes. Then based on SSCL,
label-level visual representations of the same category are aggregated
together, and features belonging to distinct categories are separated.
Meanwhile, we construct a novel PSCL module to narrow the distance between
positive samples and category prototypes and push negative samples away from
the corresponding category prototypes. Finally, the discriminative label-level
features related to the image content are accurately captured by the joint
training of the above three parts. Experiments on five challenging large-scale
public datasets demonstrate that our proposed method is effective and
outperforms the state-of-the-art methods. Code and supplementary materials are
released on https://github.com/yu-gi-oh-leilei/SADCL.
Related papers
- Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning [23.671999163027284]
This paper proposes a novel framework for multi-label image recognition without any training data.
It uses knowledge of pre-trained Large Language Model to learn prompts to adapt pretrained Vision-Language Model like CLIP to multilabel classification.
Our framework presents a new way to explore the synergies between multiple pre-trained models for novel category recognition.
arXiv Detail & Related papers (2024-03-02T13:43:32Z) - Multi-Granularity Denoising and Bidirectional Alignment for Weakly
Supervised Semantic Segmentation [75.32213865436442]
We propose an end-to-end multi-granularity denoising and bidirectional alignment (MDBA) model to alleviate the noisy label and multi-class generalization issues.
The MDBA model can reach the mIoU of 69.5% and 70.2% on validation and test sets for the PASCAL VOC 2012 dataset.
arXiv Detail & Related papers (2023-05-09T03:33:43Z) - Learning Disentangled Label Representations for Multi-label
Classification [39.97251974500034]
One-shared-Feature-for-Multiple-Labels (OFML) is not conducive to learning discriminative label features.
We introduce the One-specific-Feature-for-One-Label (OFOL) mechanism and propose a novel disentangled label feature learning framework.
We achieve state-of-the-art performance on eight datasets.
arXiv Detail & Related papers (2022-12-02T21:49:34Z) - Dual-Perspective Semantic-Aware Representation Blending for Multi-Label
Image Recognition with Partial Labels [70.36722026729859]
We propose a dual-perspective semantic-aware representation blending (DSRB) that blends multi-granularity category-specific semantic representation across different images.
The proposed DS consistently outperforms current state-of-the-art algorithms on all proportion label settings.
arXiv Detail & Related papers (2022-05-26T00:33:44Z) - Semantic Representation and Dependency Learning for Multi-Label Image
Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category.
Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model.
We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z) - Semantic-Aware Representation Blending for Multi-Label Image Recognition
with Partial Labels [86.17081952197788]
We propose to blend category-specific representation across different images to transfer information of known labels to complement unknown labels.
Experiments on the MS-COCO, Visual Genome, Pascal VOC 2007 datasets show that the proposed SARB framework obtains superior performance over current leading competitors.
arXiv Detail & Related papers (2022-03-04T07:56:16Z) - Generative Multi-Label Zero-Shot Learning [136.17594611722285]
Multi-label zero-shot learning strives to classify images into multiple unseen categories for which no data is available during training.
Our work is the first to tackle the problem of multi-label feature in the (generalized) zero-shot setting.
Our cross-level fusion-based generative approach outperforms the state-of-the-art on all three datasets.
arXiv Detail & Related papers (2021-01-27T18:56:46Z) - Semantic Disentangling Generalized Zero-Shot Learning [50.259058462272435]
Generalized Zero-Shot Learning (GZSL) aims to recognize images from both seen and unseen categories.
In this paper, we propose a novel feature disentangling approach based on an encoder-decoder architecture.
The proposed model aims to distill quality semantic-consistent representations that capture intrinsic features of seen images.
arXiv Detail & Related papers (2021-01-20T05:46:21Z) - Deep Active Learning for Joint Classification & Segmentation with Weak
Annotator [22.271760669551817]
CNN visualization and interpretation methods, like class-activation maps (CAMs), are typically used to highlight the image regions linked to class predictions.
We propose an active learning framework, which progressively integrates pixel-level annotations during training.
Our results indicate that, by simply using random sample selection, the proposed approach can significantly outperform state-of-the-art CAMs and AL methods.
arXiv Detail & Related papers (2020-10-10T03:25:54Z) - Zero-Shot Recognition through Image-Guided Semantic Classification [9.291055558504588]
We present a new embedding-based framework for zero-shot learning (ZSL)
Motivated by the binary relevance method for multi-label classification, we propose to inversely learn the mapping between an image and a semantic classifier.
IGSC is conceptually simple and can be realized by a slight enhancement of an existing deep architecture for classification.
arXiv Detail & Related papers (2020-07-23T06:22:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.