Related papers: Query2Label: A Simple Transformer Way to Multi-Label Classification

Query2Label: A Simple Transformer Way to Multi-Label Classification

URL: http://arxiv.org/abs/2107.10834v1
Date: Thu, 22 Jul 2021 17:49:25 GMT
Title: Query2Label: A Simple Transformer Way to Multi-Label Classification
Authors: Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, Jun Zhu
Abstract summary: This paper presents a simple and effective approach to solving the multi-label classification problem. The proposed approach leverages Transformer decoders to query the existence of a class label. Compared with prior works, the new framework is simple, using standard Transformers and vision backbones, and effective.
Score: 37.206922180245265
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents a simple and effective approach to solving the multi-label classification problem. The proposed approach leverages Transformer decoders to query the existence of a class label. The use of Transformer is rooted in the need of extracting local discriminative features adaptively for different labels, which is a strongly desired property due to the existence of multiple objects in one image. The built-in cross-attention module in the Transformer decoder offers an effective way to use label embeddings as queries to probe and pool class-related features from a feature map computed by a vision backbone for subsequent binary classifications. Compared with prior works, the new framework is simple, using standard Transformers and vision backbones, and effective, consistently outperforming all previous works on five multi-label classification data sets, including MS-COCO, PASCAL VOC, NUS-WIDE, and Visual Genome. Particularly, we establish $91.3\%$ mAP on MS-COCO. We hope its compact structure, simple implementation, and superior performance serve as a strong baseline for multi-label classification tasks and future studies. The code will be available soon at https://github.com/SlongLiu/query2labels.

Related papers

Modeling Multi-modal Cross-interaction for Multi-label Few-shot Image Classification Based on Local Feature Selection [55.144394711196924]
A key feature of the multi-label setting is that an image often has several labels. We propose a strategy in which label prototypes are gradually refined. Experiments on COCO, PASCAL VOC, NUS-WIDE, and iMaterialist show that our model substantially improves the current state-of-the-art.
arXiv Detail & Related papers (2024-12-18T11:10:18Z)
UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification [42.36546066941635]
Extreme Multi-label Classification (XMC) involves predicting a subset of relevant labels from an extremely large label space. This work proposes UniDEC, a novel end-to-end trainable framework which trains the dual encoder and classifier in together.
arXiv Detail & Related papers (2024-05-04T17:27:51Z)
MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation [90.73815426893034]
We propose a transformer-based framework that aims to enhance weakly supervised semantic segmentation. We introduce a Multi-Class Token transformer, which incorporates multiple class tokens to enable class-aware interactions with the patch tokens. A Contrastive-Class-Token (CCT) module is proposed to enhance the learning of discriminative class tokens.
arXiv Detail & Related papers (2023-08-06T03:30:20Z)
Retrieval-augmented Multi-label Text Classification [20.100081284294973]
Multi-label text classification is a challenging task in settings of large label sets. Retrieval augmentation aims to improve the sample efficiency of classification models. We evaluate this approach on four datasets from the legal and biomedical domains.
arXiv Detail & Related papers (2023-05-22T14:16:23Z)
Adopting the Multi-answer Questioning Task with an Auxiliary Metric for Extreme Multi-label Text Classification Utilizing the Label Hierarchy [10.87653109398961]
This paper adopts the multi-answer questioning task for extreme multi-label classification. This study adopts the proposed method and the evaluation metric to the legal domain.
arXiv Detail & Related papers (2023-03-02T08:40:31Z)
Combining Metric Learning and Attention Heads For Accurate and Efficient Multilabel Image Classification [0.0]
We revisit two popular approaches to multilabel classification: transformer-based heads and labels relations information graph processing branches. Although transformer-based heads are considered to achieve better results than graph-based branches, we argue that with the proper training strategy graph-based methods can demonstrate just a small accuracy drop.
arXiv Detail & Related papers (2022-09-14T12:06:47Z)
Open Vocabulary Multi-Label Classification with Dual-Modal Decoder on Aligned Visual-Textual Features [14.334304670606633]
We propose a novel algorithm, Aligned Dual moDality ClaSsifier (ADDS), which includes a Dual-Modal decoder (DM-decoder) with alignment between visual and textual features. Extensive experiments conducted on several standard benchmarks, NUS-WIDE, ImageNet-1k, ImageNet-21k, and MS-COCO, demonstrate that our approach significantly outperforms previous methods.
arXiv Detail & Related papers (2022-08-19T22:45:07Z)
Large Loss Matters in Weakly Supervised Multi-Label Classification [50.262533546999045]
We first regard unobserved labels as negative labels, casting the W task into noisy multi-label classification. We propose novel methods for W which reject or correct the large loss samples to prevent model from memorizing the noisy label. Our methodology actually works well, validating that treating large loss properly matters in a weakly supervised multi-label classification.
arXiv Detail & Related papers (2022-06-08T08:30:24Z)
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation [94.78965643354285]
We propose a new transformer-based framework to learn class-specific object localization maps as pseudo labels for weakly supervised semantic segmentation (WSSS) Inspired by the fact that the attended regions of the one-class token in the standard vision transformer can be leveraged to form a class-agnostic localization map, we investigate if the transformer model can also effectively capture class-specific attention for more discriminative object localization. The proposed framework is shown to fully complement the Class Activation Mapping (CAM) method, leading to remarkably superior WSSS results on the PASCAL VOC and MS COCO datasets.
arXiv Detail & Related papers (2022-03-06T07:18:23Z)
General Multi-label Image Classification with Transformers [30.58248625606648]
We propose the Classification Transformer (C-Tran) to exploit the complex dependencies among visual features and labels. A key ingredient of our method is a label mask training objective that uses a ternary encoding scheme to represent the state of the labels. Our model shows state-of-the-art performance on challenging datasets such as COCO and Visual Genome.
arXiv Detail & Related papers (2020-11-27T23:20:35Z)
LabelEnc: A New Intermediate Supervision Method for Object Detection [78.74368141062797]
We propose a new intermediate supervision method, named LabelEnc, to boost the training of object detection systems. The key idea is to introduce a novel label encoding function, mapping the ground-truth labels into latent embedding. Experiments show our method improves a variety of detection systems by around 2% on COCO dataset.
arXiv Detail & Related papers (2020-07-07T08:55:05Z)
Interaction Matching for Long-Tail Multi-Label Classification [57.262792333593644]
We present an elegant and effective approach for addressing limitations in existing multi-label classification models. By performing soft n-gram interaction matching, we match labels with natural language descriptions.
arXiv Detail & Related papers (2020-05-18T15:27:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.