Combining Metric Learning and Attention Heads For Accurate and Efficient
Multilabel Image Classification
- URL: http://arxiv.org/abs/2209.06585v1
- Date: Wed, 14 Sep 2022 12:06:47 GMT
- Title: Combining Metric Learning and Attention Heads For Accurate and Efficient
Multilabel Image Classification
- Authors: Kirill Prokofiev and Vladislav Sovrasov
- Abstract summary: We revisit two popular approaches to multilabel classification: transformer-based heads and labels relations information graph processing branches.
Although transformer-based heads are considered to achieve better results than graph-based branches, we argue that with the proper training strategy graph-based methods can demonstrate just a small accuracy drop.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-label image classification allows predicting a set of labels from a
given image. Unlike multiclass classification, where only one label per image
is assigned, such setup is applicable for a broader range of applications. In
this work we revisit two popular approaches to multilabel classification:
transformer-based heads and labels relations information graph processing
branches. Although transformer-based heads are considered to achieve better
results than graph-based branches, we argue that with the proper training
strategy graph-based methods can demonstrate just a small accuracy drop, while
spending less computational resources on inference. In our training strategy,
instead of Asymmetric Loss (ASL), which is the de-facto standard for multilabel
classification, we introduce its modification acting in the angle space. It
implicitly learns a proxy feature vector on the unit hypersphere for each
class, providing a better discrimination ability, than binary cross entropy
loss does on unnormalized features. With the proposed loss and training
strategy, we obtain SOTA results among single modality methods on widespread
multilabel classification benchmarks such as MS-COCO, PASCAL-VOC, NUS-Wide and
Visual Genome 500. Source code of our method is available as a part of the
OpenVINO Training Extensions
https://github.com/openvinotoolkit/deep-object-reid/tree/multilabel
Related papers
- Multi-label Cluster Discrimination for Visual Representation Learning [27.552024985952166]
We propose a novel Multi-Label Cluster Discrimination method named MLCD to enhance representation learning.
Our method achieves state-of-the-art performance on multiple downstream tasks including linear probe, zero-shot classification, and image-text retrieval.
arXiv Detail & Related papers (2024-07-24T14:54:16Z) - UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification [42.36546066941635]
Extreme Multi-label Classification (XMC) involves predicting a subset of relevant labels from an extremely large label space.
This work proposes UniDEC, a novel end-to-end trainable framework which trains the dual encoder and classifier in together.
arXiv Detail & Related papers (2024-05-04T17:27:51Z) - TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary
Multi-Label Classification of CLIP Without Training [29.431698321195814]
Contrastive Language-Image Pre-training (CLIP) has demonstrated impressive capabilities in open-vocabulary classification.
CLIP shows poor performance on multi-label datasets because the global feature tends to be dominated by the most prominent class.
We propose a local-to-global framework to obtain image tags.
arXiv Detail & Related papers (2023-12-20T08:15:40Z) - Graph Attention Transformer Network for Multi-Label Image Classification [50.0297353509294]
We propose a general framework for multi-label image classification that can effectively mine complex inter-label relationships.
Our proposed methods can achieve state-of-the-art performance on three datasets.
arXiv Detail & Related papers (2022-03-08T12:39:05Z) - PLM: Partial Label Masking for Imbalanced Multi-label Classification [59.68444804243782]
Neural networks trained on real-world datasets with long-tailed label distributions are biased towards frequent classes and perform poorly on infrequent classes.
We propose a method, Partial Label Masking (PLM), which utilizes this ratio during training.
Our method achieves strong performance when compared to existing methods on both multi-label (MultiMNIST and MSCOCO) and single-label (imbalanced CIFAR-10 and CIFAR-100) image classification datasets.
arXiv Detail & Related papers (2021-05-22T18:07:56Z) - All Labels Are Not Created Equal: Enhancing Semi-supervision via Label
Grouping and Co-training [32.45488147013166]
Pseudo-labeling is a key component in semi-supervised learning (SSL)
We propose SemCo, a method which leverages label semantics and co-training to address this problem.
We show that our method achieves state-of-the-art performance across various SSL tasks including 5.6% accuracy improvement on Mini-ImageNet dataset with 1000 labeled examples.
arXiv Detail & Related papers (2021-04-12T07:33:16Z) - Generative Multi-Label Zero-Shot Learning [136.17594611722285]
Multi-label zero-shot learning strives to classify images into multiple unseen categories for which no data is available during training.
Our work is the first to tackle the problem of multi-label feature in the (generalized) zero-shot setting.
Our cross-level fusion-based generative approach outperforms the state-of-the-art on all three datasets.
arXiv Detail & Related papers (2021-01-27T18:56:46Z) - Grafit: Learning fine-grained image representations with coarse labels [114.17782143848315]
This paper tackles the problem of learning a finer representation than the one provided by training labels.
By jointly leveraging the coarse labels and the underlying fine-grained latent space, it significantly improves the accuracy of category-level retrieval methods.
arXiv Detail & Related papers (2020-11-25T19:06:26Z) - Knowledge-Guided Multi-Label Few-Shot Learning for General Image
Recognition [75.44233392355711]
KGGR framework exploits prior knowledge of statistical label correlations with deep neural networks.
It first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence.
Then, it introduces the label semantics to guide learning semantic-specific features.
It exploits a graph propagation network to explore graph node interactions.
arXiv Detail & Related papers (2020-09-20T15:05:29Z) - Multi-label Zero-shot Classification by Learning to Transfer from
External Knowledge [36.04579549557464]
Multi-label zero-shot classification aims to predict multiple unseen class labels for an input image.
This paper introduces a novel multi-label zero-shot classification framework by learning to transfer from external knowledge.
arXiv Detail & Related papers (2020-07-30T17:26:46Z) - Unsupervised Person Re-identification via Multi-label Classification [55.65870468861157]
This paper formulates unsupervised person ReID as a multi-label classification task to progressively seek true labels.
Our method starts by assigning each person image with a single-class label, then evolves to multi-label classification by leveraging the updated ReID model for label prediction.
To boost the ReID model training efficiency in multi-label classification, we propose the memory-based multi-label classification loss (MMCL)
arXiv Detail & Related papers (2020-04-20T12:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.