Improving Weakly-supervised Object Localization via Causal Intervention
- URL: http://arxiv.org/abs/2104.10351v1
- Date: Wed, 21 Apr 2021 04:44:33 GMT
- Title: Improving Weakly-supervised Object Localization via Causal Intervention
- Authors: Feifei Shao, Yawei Luo, Li Zhang, Lu Ye, Siliang Tang, Yi Yang, Jun
Xiao
- Abstract summary: Recently emerged weakly supervised object localization (WSOL) methods can learn to localize an object in the image only using image-level labels.
Previous works endeavor to perceive the interval objects from the small and sparse discriminative attention map, yet ignoring the co-occurrence confounder.
Our proposed method, dubbed CI-CAM, explores the causalities among images, contexts, and categories to eliminate the biased co-occurrence in the class activation maps.
- Score: 41.272141902638275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent emerged weakly supervised object localization (WSOL) methods can
learn to localize an object in the image only using image-level labels.
Previous works endeavor to perceive the interval objects from the small and
sparse discriminative attention map, yet ignoring the co-occurrence confounder
(e.g., bird and sky), which makes the model inspection (e.g., CAM) hard to
distinguish between the object and context. In this paper, we make an early
attempt to tackle this challenge via causal intervention (CI). Our proposed
method, dubbed CI-CAM, explores the causalities among images, contexts, and
categories to eliminate the biased co-occurrence in the class activation maps
thus improving the accuracy of object localization. Extensive experiments on
several benchmarks demonstrate the effectiveness of CI-CAM in learning the
clear object boundaries from confounding contexts. Particularly, in
CUB-200-2011 which severely suffers from the co-occurrence confounder, CI-CAM
significantly outperforms the traditional CAM-based baseline (58.39% vs 52.4%
in top-1 localization accuracy). While in more general scenarios such as
ImageNet, CI-CAM can also perform on par with the state of the arts.
Related papers
- HEAP: Unsupervised Object Discovery and Localization with Contrastive
Grouping [29.678756772610797]
Unsupervised object discovery and localization aims to detect or segment objects in an image without any supervision.
Recent efforts have demonstrated a notable potential to identify salient foreground objects by utilizing self-supervised transformer features.
To address these problems, we introduce Hierarchical mErging framework via contrAstive grouPing (HEAP)
arXiv Detail & Related papers (2023-12-29T06:46:37Z) - Rethinking the Localization in Weakly Supervised Object Localization [51.29084037301646]
Weakly supervised object localization (WSOL) is one of the most popular and challenging tasks in computer vision.
Recent dividing WSOL into two parts (class-agnostic object localization and object classification) has become the state-of-the-art pipeline for this task.
We propose to replace SCR with a binary-class detector (BCD) for localizing multiple objects, where the detector is trained by discriminating the foreground and background.
arXiv Detail & Related papers (2023-08-11T14:38:51Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - A Closer Look at the Explainability of Contrastive Language-Image Pre-training [16.10032166963232]
Contrastive language-image pre-training (CLIP) is a powerful vision-language model that has shown great benefits for various tasks.
We have identified some issues with its explainability, which undermine its credibility and limit the capacity for related tasks.
We propose the CLIP Surgery for reliable CAM, a method that allows surgery-like modifications to the inference architecture and features.
arXiv Detail & Related papers (2023-04-12T07:16:55Z) - Knowledge-guided Causal Intervention for Weakly-supervised Object
Localization [32.99508048913356]
KG-CI-CAM is a knowledge-guided causal intervention method.
We tackle the co-occurrence context confounder problem via causal intervention.
We introduce a multi-source knowledge guidance framework to strike a balance between absorbing classification knowledge and localization knowledge.
arXiv Detail & Related papers (2023-01-03T12:02:19Z) - Zero-Shot Temporal Action Detection via Vision-Language Prompting [134.26292288193298]
We propose a novel zero-Shot Temporal Action detection model via Vision-LanguagE prompting (STALE)
Our model significantly outperforms state-of-the-art alternatives.
Our model also yields superior results on supervised TAD over recent strong competitors.
arXiv Detail & Related papers (2022-07-17T13:59:46Z) - Anti-Adversarially Manipulated Attributions for Weakly Supervised
Semantic Segmentation and Object Localization [31.69344455448125]
We present an attribution map of an image that is manipulated to increase the classification score produced by a classifier before the final softmax or sigmoid layer.
This manipulation is realized in an anti-adversarial manner, so that the original image is perturbed along pixel gradients in directions opposite to those used in an adversarial attack.
In addition, we introduce a new regularization procedure that inhibits the incorrect attribution of regions unrelated to the target object and the excessive concentration of attributions on a small region of the target object.
arXiv Detail & Related papers (2022-04-11T06:18:02Z) - Bridging the Gap between Classification and Localization for Weakly
Supervised Object Localization [39.63778214094173]
Weakly supervised object localization aims to find a target object region in a given image with only weak supervision, such as image-level labels.
We find the gap between classification and localization in terms of the misalignment of the directions between an input feature and a class-specific weight.
We propose a method to align feature directions with a class-specific weight to bridge the gap.
arXiv Detail & Related papers (2022-04-01T05:49:22Z) - TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised
Object Localization [112.46381729542658]
Weakly supervised object localization (WSOL) is a challenging problem when given image category labels.
We introduce the token semantic coupled attention map (TS-CAM) to take full advantage of the self-attention mechanism in visual transformer for long-range dependency extraction.
arXiv Detail & Related papers (2021-03-27T09:43:16Z) - Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization.
We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning.
Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.