Spatial Consistency Loss for Training Multi-Label Classifiers from
Single-Label Annotations
- URL: http://arxiv.org/abs/2203.06127v1
- Date: Fri, 11 Mar 2022 17:54:20 GMT
- Title: Spatial Consistency Loss for Training Multi-Label Classifiers from
Single-Label Annotations
- Authors: Thomas Verelst, Paul K. Rubenstein, Marcin Eichner, Tinne Tuytelaars,
Maxim Berman
- Abstract summary: Multi-label classification is more applicable "in the wild" than single-label classification.
We show that adding a consistency loss is a simple yet effective method to train multi-label classifiers in a weakly supervised setting.
We also demonstrate improved multi-label classification mAP on ImageNet-1K using the ReaL multi-label validation set.
- Score: 39.69823105183408
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As natural images usually contain multiple objects, multi-label image
classification is more applicable "in the wild" than single-label
classification. However, exhaustively annotating images with every object of
interest is costly and time-consuming. We aim to train multi-label classifiers
from single-label annotations only. We show that adding a consistency loss,
ensuring that the predictions of the network are consistent over consecutive
training epochs, is a simple yet effective method to train multi-label
classifiers in a weakly supervised setting. We further extend this approach
spatially, by ensuring consistency of the spatial feature maps produced over
consecutive training epochs, maintaining per-class running-average heatmaps for
each training image. We show that this spatial consistency loss further
improves the multi-label mAP of the classifiers. In addition, we show that this
method overcomes shortcomings of the "crop" data-augmentation by recovering
correct supervision signal even when most of the single ground truth object is
cropped out of the input image by the data augmentation. We demonstrate gains
of the consistency and spatial consistency losses over the binary cross-entropy
baseline, and over competing methods, on MS-COCO and Pascal VOC. We also
demonstrate improved multi-label classification mAP on ImageNet-1K using the
ReaL multi-label validation set.
Related papers
- CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image
Classification [23.392746466420128]
This paper presents a CLIP-based unsupervised learning method for annotation-free multi-label image classification.
We take full advantage of the powerful CLIP model and propose a novel approach to extend CLIP for multi-label predictions based on global-local image-text similarity aggregation.
Our method outperforms state-of-the-art unsupervised methods on MS-COCO, PASCAL VOC 2007, PASCAL VOC 2012, and NUS datasets.
arXiv Detail & Related papers (2023-07-31T13:12:02Z) - Distilling Self-Supervised Vision Transformers for Weakly-Supervised
Few-Shot Classification & Segmentation [58.03255076119459]
We address the task of weakly-supervised few-shot image classification and segmentation, by leveraging a Vision Transformer (ViT)
Our proposed method takes token representations from the self-supervised ViT and leverages their correlations, via self-attention, to produce classification and segmentation predictions.
Experiments on Pascal-5i and COCO-20i demonstrate significant performance gains in a variety of supervision settings.
arXiv Detail & Related papers (2023-07-07T06:16:43Z) - Label Structure Preserving Contrastive Embedding for Multi-Label
Learning with Missing Labels [30.79809627981242]
We introduce a label correction mechanism to identify missing labels, then define a unique contrastive loss for multi-label image classification with missing labels (CLML)
Different from existing multi-label CL losses, CLML also preserves low-rank global and local label dependencies in the latent representation space.
The proposed strategy has been shown to improve the classification performance of the Resnet101 model by margins of 1.2%, 1.6%, and 1.3% respectively on three standard datasets.
arXiv Detail & Related papers (2022-09-03T02:44:07Z) - PLMCL: Partial-Label Momentum Curriculum Learning for Multi-Label Image
Classification [25.451065364433028]
Multi-label image classification aims to predict all possible labels in an image.
Existing works on partial-label learning focus on the case where each training image is annotated with only a subset of its labels.
This paper proposes a new partial-label setting in which only a subset of the training images are labeled, each with only one positive label, while the rest of the training images remain unlabeled.
arXiv Detail & Related papers (2022-08-22T01:23:08Z) - Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and
Semi-Supervised Semantic Segmentation [119.009033745244]
This paper presents a Self-supervised Low-Rank Network ( SLRNet) for single-stage weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS)
SLRNet uses cross-view self-supervision, that is, it simultaneously predicts several attentive LR representations from different views of an image to learn precise pseudo-labels.
Experiments on the Pascal VOC 2012, COCO, and L2ID datasets demonstrate that our SLRNet outperforms both state-of-the-art WSSS and SSSS methods with a variety of different settings.
arXiv Detail & Related papers (2022-03-19T09:19:55Z) - Semi-weakly Supervised Contrastive Representation Learning for Retinal
Fundus Images [0.2538209532048867]
We propose a semi-weakly supervised contrastive learning framework for representation learning using semi-weakly annotated images.
We empirically validate the transfer learning performance of SWCL on seven public retinal fundus datasets.
arXiv Detail & Related papers (2021-08-04T15:50:09Z) - Rank-Consistency Deep Hashing for Scalable Multi-Label Image Search [90.30623718137244]
We propose a novel deep hashing method for scalable multi-label image search.
A new rank-consistency objective is applied to align the similarity orders from two spaces.
A powerful loss function is designed to penalize the samples whose semantic similarity and hamming distance are mismatched.
arXiv Detail & Related papers (2021-02-02T13:46:58Z) - Re-labeling ImageNet: from Single to Multi-Labels, from Global to
Localized Labels [34.13899937264952]
ImageNet has been arguably the most popular image classification benchmark, but it is also the one with a significant level of label noise.
Recent studies have shown that many samples contain multiple classes, despite being assumed to be a single-label benchmark.
We argue that the mismatch between single-label annotations and effectively multi-label images is equally, if not more, problematic in the training setup, where random crops are applied.
arXiv Detail & Related papers (2021-01-13T11:55:58Z) - Grafit: Learning fine-grained image representations with coarse labels [114.17782143848315]
This paper tackles the problem of learning a finer representation than the one provided by training labels.
By jointly leveraging the coarse labels and the underlying fine-grained latent space, it significantly improves the accuracy of category-level retrieval methods.
arXiv Detail & Related papers (2020-11-25T19:06:26Z) - Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive
Person Re-Identification [64.37745443119942]
This paper jointly enforces visual and temporal consistency in the combination of a local one-hot classification and a global multi-class classification.
Experimental results on three large-scale ReID datasets demonstrate the superiority of proposed method in both unsupervised and unsupervised domain adaptive ReID tasks.
arXiv Detail & Related papers (2020-07-21T14:31:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.