General Multi-label Image Classification with Transformers
- URL: http://arxiv.org/abs/2011.14027v1
- Date: Fri, 27 Nov 2020 23:20:35 GMT
- Title: General Multi-label Image Classification with Transformers
- Authors: Jack Lanchantin, Tianlu Wang, Vicente Ordonez, Yanjun Qi
- Abstract summary: We propose the Classification Transformer (C-Tran) to exploit the complex dependencies among visual features and labels.
A key ingredient of our method is a label mask training objective that uses a ternary encoding scheme to represent the state of the labels.
Our model shows state-of-the-art performance on challenging datasets such as COCO and Visual Genome.
- Score: 30.58248625606648
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-label image classification is the task of predicting a set of labels
corresponding to objects, attributes or other entities present in an image. In
this work we propose the Classification Transformer (C-Tran), a general
framework for multi-label image classification that leverages Transformers to
exploit the complex dependencies among visual features and labels. Our approach
consists of a Transformer encoder trained to predict a set of target labels
given an input set of masked labels, and visual features from a convolutional
neural network. A key ingredient of our method is a label mask training
objective that uses a ternary encoding scheme to represent the state of the
labels as positive, negative, or unknown during training. Our model shows
state-of-the-art performance on challenging datasets such as COCO and Visual
Genome. Moreover, because our model explicitly represents the uncertainty of
labels during training, it is more general by allowing us to produce improved
results for images with partial or extra label annotations during inference. We
demonstrate this additional capability in the COCO, Visual Genome, News500, and
CUB image datasets.
Related papers
- Distilling Self-Supervised Vision Transformers for Weakly-Supervised
Few-Shot Classification & Segmentation [58.03255076119459]
We address the task of weakly-supervised few-shot image classification and segmentation, by leveraging a Vision Transformer (ViT)
Our proposed method takes token representations from the self-supervised ViT and leverages their correlations, via self-attention, to produce classification and segmentation predictions.
Experiments on Pascal-5i and COCO-20i demonstrate significant performance gains in a variety of supervision settings.
arXiv Detail & Related papers (2023-07-07T06:16:43Z) - Semantic-Aware Graph Matching Mechanism for Multi-Label Image
Recognition [21.36538164675385]
Multi-label image recognition aims to predict a set of labels that present in an image.
In this paper, we treat each image as a bag of instances, and formulate the task of multi-label image recognition as an instance-label matching selection problem.
We propose an innovative Semantic-aware Graph Matching framework for Multi-Label image recognition (ML-SGM)
arXiv Detail & Related papers (2023-04-21T23:48:01Z) - Dual-Perspective Semantic-Aware Representation Blending for Multi-Label
Image Recognition with Partial Labels [70.36722026729859]
We propose a dual-perspective semantic-aware representation blending (DSRB) that blends multi-granularity category-specific semantic representation across different images.
The proposed DS consistently outperforms current state-of-the-art algorithms on all proportion label settings.
arXiv Detail & Related papers (2022-05-26T00:33:44Z) - Graph Attention Transformer Network for Multi-Label Image Classification [50.0297353509294]
We propose a general framework for multi-label image classification that can effectively mine complex inter-label relationships.
Our proposed methods can achieve state-of-the-art performance on three datasets.
arXiv Detail & Related papers (2022-03-08T12:39:05Z) - Semantic-Aware Representation Blending for Multi-Label Image Recognition
with Partial Labels [86.17081952197788]
We propose to blend category-specific representation across different images to transfer information of known labels to complement unknown labels.
Experiments on the MS-COCO, Visual Genome, Pascal VOC 2007 datasets show that the proposed SARB framework obtains superior performance over current leading competitors.
arXiv Detail & Related papers (2022-03-04T07:56:16Z) - Structured Semantic Transfer for Multi-Label Recognition with Partial
Labels [85.6967666661044]
We propose a structured semantic transfer (SST) framework that enables training multi-label recognition models with partial labels.
The framework consists of two complementary transfer modules that explore within-image and cross-image semantic correlations.
Experiments on the Microsoft COCO, Visual Genome and Pascal VOC datasets show that the proposed SST framework obtains superior performance over current state-of-the-art algorithms.
arXiv Detail & Related papers (2021-12-21T02:15:01Z) - A Weakly Supervised Convolutional Network for Change Segmentation and
Classification [91.3755431537592]
We present W-CDNet, a novel weakly supervised change detection network that can be trained with image-level semantic labels.
W-CDNet can be trained with two different types of datasets, either containing changed image pairs only or a mixture of changed and unchanged image pairs.
arXiv Detail & Related papers (2020-11-06T20:20:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.