Spatial Likelihood Voting with Self-Knowledge Distillation for Weakly
Supervised Object Detection
- URL: http://arxiv.org/abs/2204.06899v1
- Date: Thu, 14 Apr 2022 11:56:19 GMT
- Title: Spatial Likelihood Voting with Self-Knowledge Distillation for Weakly
Supervised Object Detection
- Authors: Ze Chen, Zhihang Fu, Jianqiang Huang, Mingyuan Tao, Rongxin Jiang,
Xiang Tian, Yaowu Chen and Xian-sheng Hua
- Abstract summary: We propose a WSOD framework called the Spatial Likelihood Voting with Self-knowledge Distillation Network (SLV-SD Net)
SLV-SD Net converges region proposal localization without bounding box annotations.
Experiments on the PASCAL VOC 2007/2012 and MS-COCO datasets demonstrate the excellent performance of SLV-SD Net.
- Score: 54.24966006457756
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Weakly supervised object detection (WSOD), which is an effective way to train
an object detection model using only image-level annotations, has attracted
considerable attention from researchers. However, most of the existing methods,
which are based on multiple instance learning (MIL), tend to localize instances
to the discriminative parts of salient objects instead of the entire content of
all objects. In this paper, we propose a WSOD framework called the Spatial
Likelihood Voting with Self-knowledge Distillation Network (SLV-SD Net). In
this framework, we introduce a spatial likelihood voting (SLV) module to
converge region proposal localization without bounding box annotations.
Specifically, in every iteration during training, all the region proposals in a
given image act as voters voting for the likelihood of each category in the
spatial dimensions. After dilating the alignment on the area with large
likelihood values, the voting results are regularized as bounding boxes, which
are then used for the final classification and localization. Based on SLV, we
further propose a self-knowledge distillation (SD) module to refine the feature
representations of the given image. The likelihood maps generated by the SLV
module are used to supervise the feature learning of the backbone network,
encouraging the network to attend to wider and more diverse areas of the image.
Extensive experiments on the PASCAL VOC 2007/2012 and MS-COCO datasets
demonstrate the excellent performance of SLV-SD Net. In addition, SLV-SD Net
produces new state-of-the-art results on these benchmarks.
Related papers
- Background Activation Suppression for Weakly Supervised Object
Localization and Semantic Segmentation [84.62067728093358]
Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels.
New paradigm has emerged by generating a foreground prediction map to achieve pixel-level localization.
This paper presents two astonishing experimental observations on the object localization learning process.
arXiv Detail & Related papers (2023-09-22T15:44:10Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - MOST: Multiple Object localization with Self-supervised Transformers for
object discovery [97.47075050779085]
We present Multiple Object localization with Self-supervised Transformers (MOST)
MOST uses features of transformers trained using self-supervised learning to localize multiple objects in real world images.
We show MOST can be used for self-supervised pre-training of object detectors, and yields consistent improvements on fully, semi-supervised object detection and unsupervised region proposal generation.
arXiv Detail & Related papers (2023-04-11T17:57:27Z) - De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.
We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding.
We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z) - Constrained Sampling for Class-Agnostic Weakly Supervised Object
Localization [10.542859578763068]
Self-supervised vision transformers can generate accurate localization maps of the objects in an image.
We propose leveraging the multiple maps generated by the different transformer heads to acquire pseudo-labels for training a weakly-supervised object localization model.
arXiv Detail & Related papers (2022-09-09T19:58:38Z) - Discriminative Sampling of Proposals in Self-Supervised Transformers for
Weakly Supervised Object Localization [10.542859578763068]
Self-supervised vision transformers can generate accurate localization maps of the objects in an image.
We propose leveraging the multiple maps generated by the different transformer heads to acquire pseudo-labels for training a weakly-supervised object localization model.
arXiv Detail & Related papers (2022-09-09T18:33:23Z) - Discovery-and-Selection: Towards Optimal Multiple Instance Learning for
Weakly Supervised Object Detection [86.86602297364826]
We propose a discoveryand-selection approach fused with multiple instance learning (DS-MIL)
Our proposed DS-MIL approach can consistently improve the baselines, reporting state-of-the-art performance.
arXiv Detail & Related papers (2021-10-18T07:06:57Z) - SLV: Spatial Likelihood Voting for Weakly Supervised Object Detection [31.421794727209935]
We propose a spatial likelihood voting (SLV) module to converge the proposal localizing process.
All region proposals in a given image play the role of voters every during training, voting for the likelihood of each category in spatial dimensions.
After dilating alignment on the area with large likelihood values, the voting results are regularized as bounding boxes, being used for the final classification and localization.
arXiv Detail & Related papers (2020-06-23T10:24:13Z) - Weakly-supervised Object Localization for Few-shot Learning and
Fine-grained Few-shot Learning [0.5156484100374058]
Few-shot learning aims to learn novel visual categories from very few samples.
We propose a Self-Attention Based Complementary Module (SAC Module) to fulfill the weakly-supervised object localization.
We also produce the activated masks for selecting discriminative deep descriptors for few-shot classification.
arXiv Detail & Related papers (2020-03-02T14:07:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.