Shallow Feature Matters for Weakly Supervised Object Localization
- URL: http://arxiv.org/abs/2108.00873v1
- Date: Mon, 2 Aug 2021 13:16:48 GMT
- Title: Shallow Feature Matters for Weakly Supervised Object Localization
- Authors: Jun Wei, Qin Wang, Zhen Li, Sheng Wang, S.Kevin Zhou, Shuguang Cui
- Abstract summary: Weakly supervised object localization (WSOL) aims to localize objects by only utilizing image-level labels.
Previous CAM-based methods did not take full advantage of the shallow features, despite their importance for WSOL.
In this paper, we propose a simple but effective Shallow feature-aware Pseudo supervised Object localization (SPOL) model for accurate WSOL.
- Score: 35.478997006168484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Weakly supervised object localization (WSOL) aims to localize objects by only
utilizing image-level labels. Class activation maps (CAMs) are the commonly
used features to achieve WSOL. However, previous CAM-based methods did not take
full advantage of the shallow features, despite their importance for WSOL.
Because shallow features are easily buried in background noise through
conventional fusion. In this paper, we propose a simple but effective Shallow
feature-aware Pseudo supervised Object Localization (SPOL) model for accurate
WSOL, which makes the utmost of low-level features embedded in shallow layers.
In practice, our SPOL model first generates the CAMs through a novel
element-wise multiplication of shallow and deep feature maps, which filters the
background noise and generates sharper boundaries robustly. Besides, we further
propose a general class-agnostic segmentation model to achieve the accurate
object mask, by only using the initial CAMs as the pseudo label without any
extra annotation. Eventually, a bounding box extractor is applied to the object
mask to locate the target. Experiments verify that our SPOL outperforms the
state-of-the-art on both CUB-200 and ImageNet-1K benchmarks, achieving 93.44%
and 67.15% (i.e., 3.93% and 2.13% improvement) Top-5 localization accuracy,
respectively.
Related papers
- Background Activation Suppression for Weakly Supervised Object
Localization and Semantic Segmentation [84.62067728093358]
Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels.
New paradigm has emerged by generating a foreground prediction map to achieve pixel-level localization.
This paper presents two astonishing experimental observations on the object localization learning process.
arXiv Detail & Related papers (2023-09-22T15:44:10Z) - Frequency Perception Network for Camouflaged Object Detection [51.26386921922031]
We propose a novel learnable and separable frequency perception mechanism driven by the semantic hierarchy in the frequency domain.
Our entire network adopts a two-stage model, including a frequency-guided coarse localization stage and a detail-preserving fine localization stage.
Compared with the currently existing models, our proposed method achieves competitive performance in three popular benchmark datasets.
arXiv Detail & Related papers (2023-08-17T11:30:46Z) - Rethinking the Localization in Weakly Supervised Object Localization [51.29084037301646]
Weakly supervised object localization (WSOL) is one of the most popular and challenging tasks in computer vision.
Recent dividing WSOL into two parts (class-agnostic object localization and object classification) has become the state-of-the-art pipeline for this task.
We propose to replace SCR with a binary-class detector (BCD) for localizing multiple objects, where the detector is trained by discriminating the foreground and background.
arXiv Detail & Related papers (2023-08-11T14:38:51Z) - ImpDet: Exploring Implicit Fields for 3D Object Detection [74.63774221984725]
We introduce a new perspective that views bounding box regression as an implicit function.
This leads to our proposed framework, termed Implicit Detection or ImpDet.
Our ImpDet assigns specific values to points in different local 3D spaces, thereby high-quality boundaries can be generated.
arXiv Detail & Related papers (2022-03-31T17:52:12Z) - Weakly Supervised Object Localization as Domain Adaption [19.854125742336688]
Weakly supervised object localization (WSOL) focuses on localizing objects only with the supervision of image-level classification masks.
Most previous WSOL methods follow the classification activation map (CAM) that localizes objects based on the classification structure with the multi-instance learning (MIL) mechanism.
This work provides a novel perspective that models WSOL as a domain adaption (DA) task, where the score estimator trained on the source/image domain is tested on the target/pixel domain to locate objects.
arXiv Detail & Related papers (2022-03-03T13:50:22Z) - Background-aware Classification Activation Map for Weakly Supervised
Object Localization [14.646874544729426]
We propose a background-aware classification activation map (B-CAM) to simultaneously learn localization scores of both object and background.
Our B-CAM can be trained in end-to-end manner based on a proposed stagger classification loss.
Experiments show that our B-CAM outperforms one-stage WSOL methods on the CUB-200, OpenImages and VOC2012 datasets.
arXiv Detail & Related papers (2021-12-29T03:12:09Z) - Oriented Feature Alignment for Fine-grained Object Recognition in
High-Resolution Satellite Imagery [1.0635248457021498]
We analyze the key issues of fine-grained object recognition, and use an oriented feature alignment network (OFA-Net) to achieve high-performance object recognition.
OFA-Net achieves accurate object localization through a rotated bounding boxes refinement module.
The single model of our method achieved mAP of 46.51% in the GaoFen competition and won 3rd place in the ISPRS benchmark with the mAP of 43.73%.
arXiv Detail & Related papers (2021-10-13T10:48:11Z) - Scale Normalized Image Pyramids with AutoFocus for Object Detection [75.71320993452372]
A scale normalized image pyramid (SNIP) is generated that, like human vision, only attends to objects within a fixed size range at different scales.
We propose an efficient spatial sub-sampling scheme which only operates on fixed-size sub-regions likely to contain objects.
The resulting algorithm is referred to as AutoFocus and results in a 2.5-5 times speed-up during inference when used with SNIP.
arXiv Detail & Related papers (2021-02-10T18:57:53Z) - Rethinking the Route Towards Weakly Supervised Object Localization [28.90792512056726]
We show that weakly supervised object localization should be divided into two parts: class-agnostic object localization and object classification.
For class-agnostic object localization, we should use class-agnostic methods to generate noisy pseudo annotations and then perform bounding box regression on them without class labels.
Our PSOL models have good transferability across different datasets without fine-tuning.
arXiv Detail & Related papers (2020-02-26T08:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.