Knowledge-guided Causal Intervention for Weakly-supervised Object
Localization
- URL: http://arxiv.org/abs/2301.01060v2
- Date: Tue, 12 Mar 2024 06:55:20 GMT
- Title: Knowledge-guided Causal Intervention for Weakly-supervised Object
Localization
- Authors: Feifei Shao, Yawei Luo, Fei Gao, Yi Yang, Jun Xiao
- Abstract summary: KG-CI-CAM is a knowledge-guided causal intervention method.
We tackle the co-occurrence context confounder problem via causal intervention.
We introduce a multi-source knowledge guidance framework to strike a balance between absorbing classification knowledge and localization knowledge.
- Score: 32.99508048913356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous weakly-supervised object localization (WSOL) methods aim to expand
activation map discriminative areas to cover the whole objects, yet neglect two
inherent challenges when relying solely on image-level labels. First, the
``entangled context'' issue arises from object-context co-occurrence (\eg, fish
and water), making the model inspection hard to distinguish object boundaries
clearly. Second, the ``C-L dilemma'' issue results from the information decay
caused by the pooling layers, which struggle to retain both the semantic
information for precise classification and those essential details for accurate
localization, leading to a trade-off in performance. In this paper, we propose
a knowledge-guided causal intervention method, dubbed KG-CI-CAM, to address
these two under-explored issues in one go. More specifically, we tackle the
co-occurrence context confounder problem via causal intervention, which
explores the causalities among image features, contexts, and categories to
eliminate the biased object-context entanglement in the class activation maps.
Based on the disentangled object feature, we introduce a multi-source knowledge
guidance framework to strike a balance between absorbing classification
knowledge and localization knowledge during model training. Extensive
experiments conducted on several benchmark datasets demonstrate the
effectiveness of KG-CI-CAM in learning distinct object boundaries amidst
confounding contexts and mitigating the dilemma between classification and
localization performance.
Related papers
- Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation [28.233690786378393]
We propose a textbfKnowledge textbfTransfer with textbfSimulated Inter-Image textbfErasing (KTSE) approach for weakly supervised semantic segmentation.
arXiv Detail & Related papers (2024-07-03T02:54:33Z) - Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection [101.15777242546649]
Open vocabulary object detection (OVD) aims at seeking an optimal object detector capable of recognizing objects from both base and novel categories.
Recent advances leverage knowledge distillation to transfer insightful knowledge from pre-trained large-scale vision-language models to the task of object detection.
We present a novel OVD framework termed LBP to propose learning background prompts to harness explored implicit background knowledge.
arXiv Detail & Related papers (2024-06-01T17:32:26Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Learning Prompt-Enhanced Context Features for Weakly-Supervised Video
Anomaly Detection [37.99031842449251]
Video anomaly detection under weak supervision presents significant challenges.
We present a weakly supervised anomaly detection framework that focuses on efficient context modeling and enhanced semantic discriminability.
Our approach significantly improves the detection accuracy of certain anomaly sub-classes, underscoring its practical value and efficacy.
arXiv Detail & Related papers (2023-06-26T06:45:16Z) - Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual
Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision.
Existing literature addresses this challenge by employing local-based representation approaches.
This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z) - Reason from Context with Self-supervised Learning [15.16197896174348]
We propose a new Self-supervised method with external memories for Context Reasoning (SeCo)
In both tasks, SeCo outperformed all state-of-the-art (SOTA) SSL methods by a significant margin.
Our results demonstrate that SeCo exhibits human-like behaviors.
arXiv Detail & Related papers (2022-11-23T10:02:05Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z) - Weakly-Supervised Video Object Grounding via Causal Intervention [82.68192973503119]
We target at the task of weakly-supervised video object grounding (WSVOG), where only video-sentence annotations are available during model learning.
It aims to localize objects described in the sentence to visual regions in the video, which is a fundamental capability needed in pattern analysis and machine learning.
arXiv Detail & Related papers (2021-12-01T13:13:03Z) - Improving Weakly-supervised Object Localization via Causal Intervention [41.272141902638275]
Recently emerged weakly supervised object localization (WSOL) methods can learn to localize an object in the image only using image-level labels.
Previous works endeavor to perceive the interval objects from the small and sparse discriminative attention map, yet ignoring the co-occurrence confounder.
Our proposed method, dubbed CI-CAM, explores the causalities among images, contexts, and categories to eliminate the biased co-occurrence in the class activation maps.
arXiv Detail & Related papers (2021-04-21T04:44:33Z) - Unveiling the Potential of Structure-Preserving for Weakly Supervised
Object Localization [71.79436685992128]
We propose a two-stage approach, termed structure-preserving activation (SPA), towards fully leveraging the structure information incorporated in convolutional features for WSOL.
In the first stage, a restricted activation module (RAM) is designed to alleviate the structure-missing issue caused by the classification network.
In the second stage, we propose a post-process approach, termed self-correlation map generating (SCG) module to obtain structure-preserving localization maps.
arXiv Detail & Related papers (2021-03-08T03:04:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.