Rethinking the Localization in Weakly Supervised Object Localization
- URL: http://arxiv.org/abs/2308.06161v1
- Date: Fri, 11 Aug 2023 14:38:51 GMT
- Title: Rethinking the Localization in Weakly Supervised Object Localization
- Authors: Rui Xu, Yong Luo, Han Hu, Bo Du, Jialie Shen, Yonggang Wen
- Abstract summary: Weakly supervised object localization (WSOL) is one of the most popular and challenging tasks in computer vision.
Recent dividing WSOL into two parts (class-agnostic object localization and object classification) has become the state-of-the-art pipeline for this task.
We propose to replace SCR with a binary-class detector (BCD) for localizing multiple objects, where the detector is trained by discriminating the foreground and background.
- Score: 51.29084037301646
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Weakly supervised object localization (WSOL) is one of the most popular and
challenging tasks in computer vision. This task is to localize the objects in
the images given only the image-level supervision. Recently, dividing WSOL into
two parts (class-agnostic object localization and object classification) has
become the state-of-the-art pipeline for this task. However, existing solutions
under this pipeline usually suffer from the following drawbacks: 1) they are
not flexible since they can only localize one object for each image due to the
adopted single-class regression (SCR) for localization; 2) the generated pseudo
bounding boxes may be noisy, but the negative impact of such noise is not well
addressed. To remedy these drawbacks, we first propose to replace SCR with a
binary-class detector (BCD) for localizing multiple objects, where the detector
is trained by discriminating the foreground and background. Then we design a
weighted entropy (WE) loss using the unlabeled data to reduce the negative
impact of noisy bounding boxes. Extensive experiments on the popular
CUB-200-2011 and ImageNet-1K datasets demonstrate the effectiveness of our
method.
Related papers
- Improving Weakly-Supervised Object Localization Using Adversarial Erasing and Pseudo Label [7.400926717561454]
This paper investigates a framework for weakly-supervised object localization.
It aims to train a neural network capable of predicting both the object class and its location using only images and their image-level class labels.
arXiv Detail & Related papers (2024-04-15T06:02:09Z) - Semantic-Constraint Matching Transformer for Weakly Supervised Object
Localization [31.039698757869974]
Weakly supervised object localization (WSOL) strives to learn to localize objects with only image-level supervision.
Previous CNN-based methods suffer from partial activation issues, concentrating on the object's discriminative part instead of the entire entity scope.
We propose a novel Semantic-Constraint Matching Network (SCMN) via a transformer to converge on the divergent activation.
arXiv Detail & Related papers (2023-09-04T03:20:31Z) - Spatial-Aware Token for Weakly Supervised Object Localization [137.0570026552845]
We propose a task-specific spatial-aware token to condition localization in a weakly supervised manner.
Experiments show that the proposed SAT achieves state-of-the-art performance on both CUB-200 and ImageNet, with 98.45% and 73.13% GT-known Loc.
arXiv Detail & Related papers (2023-03-18T15:38:17Z) - Boosting Few-shot Fine-grained Recognition with Background Suppression
and Foreground Alignment [53.401889855278704]
Few-shot fine-grained recognition (FS-FGR) aims to recognize novel fine-grained categories with the help of limited available samples.
We propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local to local (L2L) similarity metric.
Experiments conducted on multiple popular fine-grained benchmarks demonstrate that our method outperforms the existing state-of-the-art by a large margin.
arXiv Detail & Related papers (2022-10-04T07:54:40Z) - Self-Supervised Video Object Segmentation via Cutout Prediction and
Tagging [117.73967303377381]
We propose a novel self-supervised Video Object (VOS) approach that strives to achieve better object-background discriminability.
Our approach is based on a discriminative learning loss formulation that takes into account both object and background information.
Our proposed approach, CT-VOS, achieves state-of-the-art results on two challenging benchmarks: DAVIS-2017 and Youtube-VOS.
arXiv Detail & Related papers (2022-04-22T17:53:27Z) - Anti-Adversarially Manipulated Attributions for Weakly Supervised
Semantic Segmentation and Object Localization [31.69344455448125]
We present an attribution map of an image that is manipulated to increase the classification score produced by a classifier before the final softmax or sigmoid layer.
This manipulation is realized in an anti-adversarial manner, so that the original image is perturbed along pixel gradients in directions opposite to those used in an adversarial attack.
In addition, we introduce a new regularization procedure that inhibits the incorrect attribution of regions unrelated to the target object and the excessive concentration of attributions on a small region of the target object.
arXiv Detail & Related papers (2022-04-11T06:18:02Z) - Background-aware Classification Activation Map for Weakly Supervised
Object Localization [14.646874544729426]
We propose a background-aware classification activation map (B-CAM) to simultaneously learn localization scores of both object and background.
Our B-CAM can be trained in end-to-end manner based on a proposed stagger classification loss.
Experiments show that our B-CAM outperforms one-stage WSOL methods on the CUB-200, OpenImages and VOC2012 datasets.
arXiv Detail & Related papers (2021-12-29T03:12:09Z) - SCRDet++: Detecting Small, Cluttered and Rotated Objects via
Instance-Level Feature Denoising and Rotation Loss Smoothing [131.04304632759033]
Small and cluttered objects are common in real-world which are challenging for detection.
In this paper, we first innovatively introduce the idea of denoising to object detection.
Instance-level denoising on the feature map is performed to enhance the detection to small and cluttered objects.
arXiv Detail & Related papers (2020-04-28T06:03:54Z) - Solving Missing-Annotation Object Detection with Background
Recalibration Loss [49.42997894751021]
This paper focuses on a novel and challenging detection scenario: A majority of true objects/instances is unlabeled in the datasets.
Previous art has proposed to use soft sampling to re-weight the gradients of RoIs based on the overlaps with positive instances, while their method is mainly based on the two-stage detector.
In this paper, we introduce a superior solution called Background Recalibration Loss (BRL) that can automatically re-calibrate the loss signals according to the pre-defined IoU threshold and input image.
arXiv Detail & Related papers (2020-02-12T23:11:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.