Learning Disentangled Label Representations for Multi-label
Classification
- URL: http://arxiv.org/abs/2212.01461v1
- Date: Fri, 2 Dec 2022 21:49:34 GMT
- Title: Learning Disentangled Label Representations for Multi-label
Classification
- Authors: Jian Jia, Fei He, Naiyu Gao, Xiaotang Chen, Kaiqi Huang
- Abstract summary: One-shared-Feature-for-Multiple-Labels (OFML) is not conducive to learning discriminative label features.
We introduce the One-specific-Feature-for-One-Label (OFOL) mechanism and propose a novel disentangled label feature learning framework.
We achieve state-of-the-art performance on eight datasets.
- Score: 39.97251974500034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although various methods have been proposed for multi-label classification,
most approaches still follow the feature learning mechanism of the single-label
(multi-class) classification, namely, learning a shared image feature to
classify multiple labels. However, we find this
One-shared-Feature-for-Multiple-Labels (OFML) mechanism is not conducive to
learning discriminative label features and makes the model non-robustness. For
the first time, we mathematically prove that the inferiority of the OFML
mechanism is that the optimal learned image feature cannot maintain high
similarities with multiple classifiers simultaneously in the context of
minimizing cross-entropy loss. To address the limitations of the OFML
mechanism, we introduce the One-specific-Feature-for-One-Label (OFOL) mechanism
and propose a novel disentangled label feature learning (DLFL) framework to
learn a disentangled representation for each label. The specificity of the
framework lies in a feature disentangle module, which contains learnable
semantic queries and a Semantic Spatial Cross-Attention (SSCA) module.
Specifically, learnable semantic queries maintain semantic consistency between
different images of the same label. The SSCA module localizes the label-related
spatial regions and aggregates located region features into the corresponding
label feature to achieve feature disentanglement. We achieve state-of-the-art
performance on eight datasets of three tasks, \ie, multi-label classification,
pedestrian attribute recognition, and continual multi-label learning.
Related papers
- TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary
Multi-Label Classification of CLIP Without Training [29.431698321195814]
Contrastive Language-Image Pre-training (CLIP) has demonstrated impressive capabilities in open-vocabulary classification.
CLIP shows poor performance on multi-label datasets because the global feature tends to be dominated by the most prominent class.
We propose a local-to-global framework to obtain image tags.
arXiv Detail & Related papers (2023-12-20T08:15:40Z) - CARAT: Contrastive Feature Reconstruction and Aggregation for
Multi-Modal Multi-Label Emotion Recognition [18.75994345925282]
Multi-modal multi-label emotion recognition (MMER) aims to identify relevant emotions from multiple modalities.
The challenge of MMER is how to effectively capture discriminative features for multiple labels from heterogeneous data.
This paper presents ContrAstive feature Reconstruction and AggregaTion (CARAT) for the MMER task.
arXiv Detail & Related papers (2023-12-15T20:58:05Z) - Semantic-Aware Dual Contrastive Learning for Multi-label Image
Classification [8.387933969327852]
We propose a novel semantic-aware dual contrastive learning framework that incorporates sample-to-sample contrastive learning.
Specifically, we leverage semantic-aware representation learning to extract category-related local discriminative features.
Our proposed method is effective and outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2023-07-19T01:57:31Z) - Reliable Representations Learning for Incomplete Multi-View Partial Multi-Label Classification [78.15629210659516]
In this paper, we propose an incomplete multi-view partial multi-label classification network named RANK.
We break through the view-level weights inherent in existing methods and propose a quality-aware sub-network to dynamically assign quality scores to each view of each sample.
Our model is not only able to handle complete multi-view multi-label datasets, but also works on datasets with missing instances and labels.
arXiv Detail & Related papers (2023-03-30T03:09:25Z) - Semantic-Aware Representation Blending for Multi-Label Image Recognition
with Partial Labels [86.17081952197788]
We propose to blend category-specific representation across different images to transfer information of known labels to complement unknown labels.
Experiments on the MS-COCO, Visual Genome, Pascal VOC 2007 datasets show that the proposed SARB framework obtains superior performance over current leading competitors.
arXiv Detail & Related papers (2022-03-04T07:56:16Z) - Structured Semantic Transfer for Multi-Label Recognition with Partial
Labels [85.6967666661044]
We propose a structured semantic transfer (SST) framework that enables training multi-label recognition models with partial labels.
The framework consists of two complementary transfer modules that explore within-image and cross-image semantic correlations.
Experiments on the Microsoft COCO, Visual Genome and Pascal VOC datasets show that the proposed SST framework obtains superior performance over current state-of-the-art algorithms.
arXiv Detail & Related papers (2021-12-21T02:15:01Z) - Generative Multi-Label Zero-Shot Learning [136.17594611722285]
Multi-label zero-shot learning strives to classify images into multiple unseen categories for which no data is available during training.
Our work is the first to tackle the problem of multi-label feature in the (generalized) zero-shot setting.
Our cross-level fusion-based generative approach outperforms the state-of-the-art on all three datasets.
arXiv Detail & Related papers (2021-01-27T18:56:46Z) - Unsupervised Person Re-identification via Multi-label Classification [55.65870468861157]
This paper formulates unsupervised person ReID as a multi-label classification task to progressively seek true labels.
Our method starts by assigning each person image with a single-class label, then evolves to multi-label classification by leveraging the updated ReID model for label prediction.
To boost the ReID model training efficiency in multi-label classification, we propose the memory-based multi-label classification loss (MMCL)
arXiv Detail & Related papers (2020-04-20T12:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.