Query2Label: A Simple Transformer Way to Multi-Label Classification
- URL: http://arxiv.org/abs/2107.10834v1
- Date: Thu, 22 Jul 2021 17:49:25 GMT
- Title: Query2Label: A Simple Transformer Way to Multi-Label Classification
- Authors: Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, Jun Zhu
- Abstract summary: This paper presents a simple and effective approach to solving the multi-label classification problem.
The proposed approach leverages Transformer decoders to query the existence of a class label.
Compared with prior works, the new framework is simple, using standard Transformers and vision backbones, and effective.
- Score: 37.206922180245265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a simple and effective approach to solving the
multi-label classification problem. The proposed approach leverages Transformer
decoders to query the existence of a class label. The use of Transformer is
rooted in the need of extracting local discriminative features adaptively for
different labels, which is a strongly desired property due to the existence of
multiple objects in one image. The built-in cross-attention module in the
Transformer decoder offers an effective way to use label embeddings as queries
to probe and pool class-related features from a feature map computed by a
vision backbone for subsequent binary classifications. Compared with prior
works, the new framework is simple, using standard Transformers and vision
backbones, and effective, consistently outperforming all previous works on five
multi-label classification data sets, including MS-COCO, PASCAL VOC, NUS-WIDE,
and Visual Genome. Particularly, we establish $91.3\%$ mAP on MS-COCO. We hope
its compact structure, simple implementation, and superior performance serve as
a strong baseline for multi-label classification tasks and future studies. The
code will be available soon at https://github.com/SlongLiu/query2labels.
Related papers
- UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification [42.36546066941635]
Extreme Multi-label Classification (XMC) involves predicting a subset of relevant labels from an extremely large label space.
This work proposes UniDEC, a novel end-to-end trainable framework which trains the dual encoder and classifier in together.
arXiv Detail & Related papers (2024-05-04T17:27:51Z) - MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic
Segmentation [90.73815426893034]
We propose a transformer-based framework that aims to enhance weakly supervised semantic segmentation.
We introduce a Multi-Class Token transformer, which incorporates multiple class tokens to enable class-aware interactions with the patch tokens.
A Contrastive-Class-Token (CCT) module is proposed to enhance the learning of discriminative class tokens.
arXiv Detail & Related papers (2023-08-06T03:30:20Z) - Retrieval-augmented Multi-label Text Classification [20.100081284294973]
Multi-label text classification is a challenging task in settings of large label sets.
Retrieval augmentation aims to improve the sample efficiency of classification models.
We evaluate this approach on four datasets from the legal and biomedical domains.
arXiv Detail & Related papers (2023-05-22T14:16:23Z) - Combining Metric Learning and Attention Heads For Accurate and Efficient
Multilabel Image Classification [0.0]
We revisit two popular approaches to multilabel classification: transformer-based heads and labels relations information graph processing branches.
Although transformer-based heads are considered to achieve better results than graph-based branches, we argue that with the proper training strategy graph-based methods can demonstrate just a small accuracy drop.
arXiv Detail & Related papers (2022-09-14T12:06:47Z) - Open Vocabulary Multi-Label Classification with Dual-Modal Decoder on
Aligned Visual-Textual Features [14.334304670606633]
We propose a novel algorithm, Aligned Dual moDality ClaSsifier (ADDS), which includes a Dual-Modal decoder (DM-decoder) with alignment between visual and textual features.
Extensive experiments conducted on several standard benchmarks, NUS-WIDE, ImageNet-1k, ImageNet-21k, and MS-COCO, demonstrate that our approach significantly outperforms previous methods.
arXiv Detail & Related papers (2022-08-19T22:45:07Z) - Large Loss Matters in Weakly Supervised Multi-Label Classification [50.262533546999045]
We first regard unobserved labels as negative labels, casting the W task into noisy multi-label classification.
We propose novel methods for W which reject or correct the large loss samples to prevent model from memorizing the noisy label.
Our methodology actually works well, validating that treating large loss properly matters in a weakly supervised multi-label classification.
arXiv Detail & Related papers (2022-06-08T08:30:24Z) - Multi-class Token Transformer for Weakly Supervised Semantic
Segmentation [94.78965643354285]
We propose a new transformer-based framework to learn class-specific object localization maps as pseudo labels for weakly supervised semantic segmentation (WSSS)
Inspired by the fact that the attended regions of the one-class token in the standard vision transformer can be leveraged to form a class-agnostic localization map, we investigate if the transformer model can also effectively capture class-specific attention for more discriminative object localization.
The proposed framework is shown to fully complement the Class Activation Mapping (CAM) method, leading to remarkably superior WSSS results on the PASCAL VOC and MS COCO datasets.
arXiv Detail & Related papers (2022-03-06T07:18:23Z) - General Multi-label Image Classification with Transformers [30.58248625606648]
We propose the Classification Transformer (C-Tran) to exploit the complex dependencies among visual features and labels.
A key ingredient of our method is a label mask training objective that uses a ternary encoding scheme to represent the state of the labels.
Our model shows state-of-the-art performance on challenging datasets such as COCO and Visual Genome.
arXiv Detail & Related papers (2020-11-27T23:20:35Z) - LabelEnc: A New Intermediate Supervision Method for Object Detection [78.74368141062797]
We propose a new intermediate supervision method, named LabelEnc, to boost the training of object detection systems.
The key idea is to introduce a novel label encoding function, mapping the ground-truth labels into latent embedding.
Experiments show our method improves a variety of detection systems by around 2% on COCO dataset.
arXiv Detail & Related papers (2020-07-07T08:55:05Z) - Interaction Matching for Long-Tail Multi-Label Classification [57.262792333593644]
We present an elegant and effective approach for addressing limitations in existing multi-label classification models.
By performing soft n-gram interaction matching, we match labels with natural language descriptions.
arXiv Detail & Related papers (2020-05-18T15:27:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.