CoLo-CAM: Class Activation Mapping for Object Co-Localization in
Weakly-Labeled Unconstrained Videos
- URL: http://arxiv.org/abs/2303.09044v4
- Date: Wed, 28 Feb 2024 13:53:28 GMT
- Title: CoLo-CAM: Class Activation Mapping for Object Co-Localization in
Weakly-Labeled Unconstrained Videos
- Authors: Soufiane Belharbi, Shakeeb Murtaza, Marco Pedersoli, Ismail Ben Ayed,
Luke McCaffrey, Eric Granger
- Abstract summary: Co-Localization-CAM method exploitstemporal information in activation maps during training without constraining an object's position.
Co-Localization improves localization performance because the joint learning creates direct communication among pixels across all image locations.
- Score: 23.447026400051772
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Leveraging spatiotemporal information in videos is critical for weakly
supervised video object localization (WSVOL) tasks. However, state-of-the-art
methods only rely on visual and motion cues, while discarding discriminative
information, making them susceptible to inaccurate localizations. Recently,
discriminative models have been explored for WSVOL tasks using a temporal class
activation mapping (CAM) method. Although their results are promising, objects
are assumed to have limited movement from frame to frame, leading to
degradation in performance for relatively long-term dependencies. This paper
proposes a novel CAM method for WSVOL that exploits spatiotemporal information
in activation maps during training without constraining an object's position.
Its training relies on Co-Localization, hence, the name CoLo-CAM. Given a
sequence of frames, localization is jointly learned based on color cues
extracted across the corresponding maps, by assuming that an object has similar
color in consecutive frames. CAM activations are constrained to respond
similarly over pixels with similar colors, achieving co-localization. This
improves localization performance because the joint learning creates direct
communication among pixels across all image locations and over all frames,
allowing for transfer, aggregation, and correction of localizations.
Co-localization is integrated into training by minimizing the color term of a
conditional random field (CRF) loss over a sequence of frames/CAMs. Extensive
experiments on two challenging YouTube-Objects datasets of unconstrained videos
show the merits of our CoLo-CAM method, and its robustness to long-term
dependencies, leading to new state-of-the-art performance for WSVOL task.
Related papers
- Leveraging Transformers for Weakly Supervised Object Localization in Unconstrained Videos [12.762698438702854]
State-of-the-art WSVOL methods rely on class activation mapping (CAM)
Our TrCAM-V method allows training a localization network by sampling pseudo-pixels on the fly from these regions.
During inference, the model can process individual frames for real-time localization applications.
arXiv Detail & Related papers (2024-07-08T15:08:41Z) - Weakly-Supervised Temporal Action Localization with Bidirectional
Semantic Consistency Constraint [83.36913240873236]
Weakly Supervised Temporal Action localization (WTAL) aims to classify and localize temporal boundaries of actions for the video.
We propose a simple yet efficient method, named bidirectional semantic consistency constraint (Bi- SCC) to discriminate the positive actions from co-scene actions.
Experimental results show that our approach outperforms the state-of-the-art methods on THUMOS14 and ActivityNet.
arXiv Detail & Related papers (2023-04-25T07:20:33Z) - Attention-based Class Activation Diffusion for Weakly-Supervised
Semantic Segmentation [98.306533433627]
extracting class activation maps (CAM) is a key step for weakly-supervised semantic segmentation (WSSS)
This paper proposes a new method to couple CAM and Attention matrix in a probabilistic Diffusion way, and dub it AD-CAM.
Experiments show that AD-CAM as pseudo labels can yield stronger WSSS models than the state-of-the-art variants of CAM.
arXiv Detail & Related papers (2022-11-20T10:06:32Z) - TCAM: Temporal Class Activation Maps for Object Localization in
Weakly-Labeled Unconstrained Videos [22.271760669551817]
Weakly supervised object localization (WSVOL) allows object locating in videos using only global video tags as such object class.
In this paper, we leverage the successful class activation mapping (CAM) methods, designed for WSOL based on still images.
A new Temporal CAM (TCAM) method is introduced to train ariminant deep learning (DL) model to exploittemporal information in videos.
arXiv Detail & Related papers (2022-08-30T21:20:34Z) - CREAM: Weakly Supervised Object Localization via Class RE-Activation
Mapping [18.67907876709536]
Class RE-Activation Mapping (CREAM) is a clustering-based approach to boost the activation values of the integral object regions.
CREAM achieves the state-of-the-art performance on CUB, ILSVRC and OpenImages benchmark datasets.
arXiv Detail & Related papers (2022-05-27T11:57:41Z) - Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene
Segmentation [58.74791043631219]
We propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance.
We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset.
Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T05:52:23Z) - Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised
Correspondence Learning [74.03651142051656]
We develop LIIR, a locality-aware inter-and intra-video reconstruction framework.
We exploit cross video affinities as extra negative samples within a unified, inter-and intra-video reconstruction scheme.
arXiv Detail & Related papers (2022-03-27T15:46:42Z) - F-CAM: Full Resolution CAM via Guided Parametric Upscaling [20.609010268320013]
Class Activation Mapping (CAM) methods have recently gained much attention for weakly-supervised object localization (WSOL) tasks.
CAM methods are typically integrated within off-the-shelf CNN backbones, such as ResNet50.
We introduce a generic method for parametric upscaling of CAMs that allows constructing accurate full resolution CAMs.
arXiv Detail & Related papers (2021-09-15T04:45:20Z) - TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised
Object Localization [112.46381729542658]
Weakly supervised object localization (WSOL) is a challenging problem when given image category labels.
We introduce the token semantic coupled attention map (TS-CAM) to take full advantage of the self-attention mechanism in visual transformer for long-range dependency extraction.
arXiv Detail & Related papers (2021-03-27T09:43:16Z) - Inter-Image Communication for Weakly Supervised Localization [77.2171924626778]
Weakly supervised localization aims at finding target object regions using only image-level supervision.
We propose to leverage pixel-level similarities across different objects for learning more accurate object locations.
Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
arXiv Detail & Related papers (2020-08-12T04:14:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.