Semantic-Constraint Matching Transformer for Weakly Supervised Object
Localization
- URL: http://arxiv.org/abs/2309.01331v1
- Date: Mon, 4 Sep 2023 03:20:31 GMT
- Title: Semantic-Constraint Matching Transformer for Weakly Supervised Object
Localization
- Authors: Yiwen Cao, Yukun Su, Wenjun Wang, Yanxia Liu and Qingyao Wu
- Abstract summary: Weakly supervised object localization (WSOL) strives to learn to localize objects with only image-level supervision.
Previous CNN-based methods suffer from partial activation issues, concentrating on the object's discriminative part instead of the entire entity scope.
We propose a novel Semantic-Constraint Matching Network (SCMN) via a transformer to converge on the divergent activation.
- Score: 31.039698757869974
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly supervised object localization (WSOL) strives to learn to localize
objects with only image-level supervision. Due to the local receptive fields
generated by convolution operations, previous CNN-based methods suffer from
partial activation issues, concentrating on the object's discriminative part
instead of the entire entity scope. Benefiting from the capability of the
self-attention mechanism to acquire long-range feature dependencies, Vision
Transformer has been recently applied to alleviate the local activation
drawbacks. However, since the transformer lacks the inductive localization bias
that are inherent in CNNs, it may cause a divergent activation problem
resulting in an uncertain distinction between foreground and background. In
this work, we proposed a novel Semantic-Constraint Matching Network (SCMN) via
a transformer to converge on the divergent activation. Specifically, we first
propose a local patch shuffle strategy to construct the image pairs, disrupting
local patches while guaranteeing global consistency. The paired images that
contain the common object in spatial are then fed into the Siamese network
encoder. We further design a semantic-constraint matching module, which aims to
mine the co-object part by matching the coarse class activation maps (CAMs)
extracted from the pair images, thus implicitly guiding and calibrating the
transformer network to alleviate the divergent activation. Extensive
experimental results conducted on two challenging benchmarks, including
CUB-200-2011 and ILSVRC datasets show that our method can achieve the new
state-of-the-art performance and outperform the previous method by a large
margin.
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - Multiscale Vision Transformer With Deep Clustering-Guided Refinement for
Weakly Supervised Object Localization [4.300577895958228]
This work addresses the task of weakly-supervised object localization.
It comprises multiple object localization transformers that extract patch embeddings across various scales.
We introduce a deep clustering-guided refinement method that further enhances localization accuracy.
arXiv Detail & Related papers (2023-12-15T07:46:44Z) - Dual-Augmented Transformer Network for Weakly Supervised Semantic
Segmentation [4.02487511510606]
Weakly supervised semantic segmentation (WSSS) is a fundamental computer vision task, which aims to segment out the object within only class-level labels.
Traditional methods adopt the CNN-based network and utilize the class activation map (CAM) strategy to discover the object regions.
An alternative is to explore vision transformers (ViT) to encode the image to acquire the global semantic information.
We propose a dual network with both CNN-based and transformer networks for mutually complementary learning.
arXiv Detail & Related papers (2023-09-30T08:41:11Z) - Background Activation Suppression for Weakly Supervised Object
Localization and Semantic Segmentation [84.62067728093358]
Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels.
New paradigm has emerged by generating a foreground prediction map to achieve pixel-level localization.
This paper presents two astonishing experimental observations on the object localization learning process.
arXiv Detail & Related papers (2023-09-22T15:44:10Z) - Rethinking the Localization in Weakly Supervised Object Localization [51.29084037301646]
Weakly supervised object localization (WSOL) is one of the most popular and challenging tasks in computer vision.
Recent dividing WSOL into two parts (class-agnostic object localization and object classification) has become the state-of-the-art pipeline for this task.
We propose to replace SCR with a binary-class detector (BCD) for localizing multiple objects, where the detector is trained by discriminating the foreground and background.
arXiv Detail & Related papers (2023-08-11T14:38:51Z) - Spatial-Aware Token for Weakly Supervised Object Localization [137.0570026552845]
We propose a task-specific spatial-aware token to condition localization in a weakly supervised manner.
Experiments show that the proposed SAT achieves state-of-the-art performance on both CUB-200 and ImageNet, with 98.45% and 73.13% GT-known Loc.
arXiv Detail & Related papers (2023-03-18T15:38:17Z) - Dual Progressive Transformations for Weakly Supervised Semantic
Segmentation [23.68115323096787]
Weakly supervised semantic segmentation (WSSS) is a challenging task in computer vision.
We propose a Convolutional Neural Networks Refined Transformer (CRT) to mine a globally complete and locally accurate class activation maps.
Our proposed CRT achieves the new state-of-the-art performance on both the weakly supervised semantic segmentation task.
arXiv Detail & Related papers (2022-09-30T03:42:52Z) - Weakly Supervised Object Localization via Transformer with Implicit
Spatial Calibration [20.322494442959762]
Weakly Supervised Object Localization (WSOL) has attracted much attention because of its low annotation cost in real applications.
We introduce a simple yet effective Spatial Module (SCM) for accurate WSOL, incorporating semantic similarities of patch tokens and their spatial relationships into a unified diffusion model.
SCM is designed as an external module of Transformer, and can be removed during inference to reduce the computation cost.
arXiv Detail & Related papers (2022-07-21T12:37:15Z) - LCTR: On Awakening the Local Continuity of Transformer for Weakly
Supervised Object Localization [38.376238216214524]
Weakly supervised object localization (WSOL) aims to learn object localizer solely by using image-level labels.
We propose a novel framework built upon the transformer, termed LCTR, which targets at enhancing the local perception capability of global features.
arXiv Detail & Related papers (2021-12-10T01:48:40Z) - TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised
Object Localization [112.46381729542658]
Weakly supervised object localization (WSOL) is a challenging problem when given image category labels.
We introduce the token semantic coupled attention map (TS-CAM) to take full advantage of the self-attention mechanism in visual transformer for long-range dependency extraction.
arXiv Detail & Related papers (2021-03-27T09:43:16Z) - Contradictory Structure Learning for Semi-supervised Domain Adaptation [67.89665267469053]
Current adversarial adaptation methods attempt to align the cross-domain features.
Two challenges remain unsolved: 1) the conditional distribution mismatch and 2) the bias of the decision boundary towards the source domain.
We propose a novel framework for semi-supervised domain adaptation by unifying the learning of opposite structures.
arXiv Detail & Related papers (2020-02-06T22:58:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.