Discriminative Sampling of Proposals in Self-Supervised Transformers for
Weakly Supervised Object Localization
- URL: http://arxiv.org/abs/2209.09209v1
- Date: Fri, 9 Sep 2022 18:33:23 GMT
- Title: Discriminative Sampling of Proposals in Self-Supervised Transformers for
Weakly Supervised Object Localization
- Authors: Shakeeb Murtaza, Soufiane Belharbi, Marco Pedersoli, Aydin Sarraf,
Eric Granger
- Abstract summary: Self-supervised vision transformers can generate accurate localization maps of the objects in an image.
We propose leveraging the multiple maps generated by the different transformer heads to acquire pseudo-labels for training a weakly-supervised object localization model.
- Score: 10.542859578763068
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised vision transformers can generate accurate localization maps
of the objects in an image. However, since they decompose the scene into
multiple maps containing various objects, and they do not rely on any explicit
supervisory signal, they cannot distinguish between the object of interest from
other objects, as required in weakly-supervised object localization (WSOL). To
address this issue, we propose leveraging the multiple maps generated by the
different transformer heads to acquire pseudo-labels for training a WSOL model.
In particular, a new Discriminative Proposals Sampling (DiPS) method is
introduced that relies on a pretrained CNN classifier to identify
discriminative regions. Then, foreground and background pixels are sampled from
these regions in order to train a WSOL model for generating activation maps
that can accurately localize objects belonging to a specific class. Empirical
results on the challenging CUB, OpenImages, and ILSVRC benchmark datasets
indicate that our proposed approach can outperform state-of-art methods over a
wide range of threshold values. DiPS provides class activation maps with a
better coverage of foreground object regions w.r.t. the background.
Related papers
- Multiscale Vision Transformer With Deep Clustering-Guided Refinement for
Weakly Supervised Object Localization [4.300577895958228]
This work addresses the task of weakly-supervised object localization.
It comprises multiple object localization transformers that extract patch embeddings across various scales.
We introduce a deep clustering-guided refinement method that further enhances localization accuracy.
arXiv Detail & Related papers (2023-12-15T07:46:44Z) - DiPS: Discriminative Pseudo-Label Sampling with Self-Supervised
Transformers for Weakly Supervised Object Localization [13.412674368913747]
Discriminative Pseudo-label Sampling (DiPS) is introduced to leverage class-agnostic maps for weakly-supervised object localization.
DiPS relies on a pre-trained classifier to identify the most discriminative regions of each attention map.
It provides a rich pool of diverse and discriminative proposals to cover different parts of the object.
arXiv Detail & Related papers (2023-10-09T22:52:43Z) - Background Activation Suppression for Weakly Supervised Object
Localization and Semantic Segmentation [84.62067728093358]
Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels.
New paradigm has emerged by generating a foreground prediction map to achieve pixel-level localization.
This paper presents two astonishing experimental observations on the object localization learning process.
arXiv Detail & Related papers (2023-09-22T15:44:10Z) - Rethinking the Localization in Weakly Supervised Object Localization [51.29084037301646]
Weakly supervised object localization (WSOL) is one of the most popular and challenging tasks in computer vision.
Recent dividing WSOL into two parts (class-agnostic object localization and object classification) has become the state-of-the-art pipeline for this task.
We propose to replace SCR with a binary-class detector (BCD) for localizing multiple objects, where the detector is trained by discriminating the foreground and background.
arXiv Detail & Related papers (2023-08-11T14:38:51Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - MOST: Multiple Object localization with Self-supervised Transformers for
object discovery [97.47075050779085]
We present Multiple Object localization with Self-supervised Transformers (MOST)
MOST uses features of transformers trained using self-supervised learning to localize multiple objects in real world images.
We show MOST can be used for self-supervised pre-training of object detectors, and yields consistent improvements on fully, semi-supervised object detection and unsupervised region proposal generation.
arXiv Detail & Related papers (2023-04-11T17:57:27Z) - Constrained Sampling for Class-Agnostic Weakly Supervised Object
Localization [10.542859578763068]
Self-supervised vision transformers can generate accurate localization maps of the objects in an image.
We propose leveraging the multiple maps generated by the different transformer heads to acquire pseudo-labels for training a weakly-supervised object localization model.
arXiv Detail & Related papers (2022-09-09T19:58:38Z) - Spatial Likelihood Voting with Self-Knowledge Distillation for Weakly
Supervised Object Detection [54.24966006457756]
We propose a WSOD framework called the Spatial Likelihood Voting with Self-knowledge Distillation Network (SLV-SD Net)
SLV-SD Net converges region proposal localization without bounding box annotations.
Experiments on the PASCAL VOC 2007/2012 and MS-COCO datasets demonstrate the excellent performance of SLV-SD Net.
arXiv Detail & Related papers (2022-04-14T11:56:19Z) - Rethinking Localization Map: Towards Accurate Object Perception with
Self-Enhancement Maps [78.2581910688094]
This work introduces a novel self-enhancement method to harvest accurate object localization maps and object boundaries with only category labels as supervision.
In particular, the proposed Self-Enhancement Maps achieve the state-of-the-art localization accuracy of 54.88% on ILSVRC.
arXiv Detail & Related papers (2020-06-09T12:35:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.