USAGE: A Unified Seed Area Generation Paradigm for Weakly Supervised
Semantic Segmentation
- URL: http://arxiv.org/abs/2303.07806v2
- Date: Thu, 31 Aug 2023 13:00:55 GMT
- Title: USAGE: A Unified Seed Area Generation Paradigm for Weakly Supervised
Semantic Segmentation
- Authors: Zelin Peng, Guanchun Wang, Lingxi Xie, Dongsheng Jiang, Wei Shen, Qi
Tian
- Abstract summary: We propose a Unified optimization paradigm for Seed Area GEneration (USAGE) for both types of networks.
Experimental results show that USAGE consistently improves seed area generation for both CNNs and Transformers.
- Score: 90.08744714206233
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Seed area generation is usually the starting point of weakly supervised
semantic segmentation (WSSS). Computing the Class Activation Map (CAM) from a
multi-label classification network is the de facto paradigm for seed area
generation, but CAMs generated from Convolutional Neural Networks (CNNs) and
Transformers are prone to be under- and over-activated, respectively, which
makes the strategies to refine CAMs for CNNs usually inappropriate for
Transformers, and vice versa. In this paper, we propose a Unified optimization
paradigm for Seed Area GEneration (USAGE) for both types of networks, in which
the objective function to be optimized consists of two terms: One is a
generation loss, which controls the shape of seed areas by a temperature
parameter following a deterministic principle for different types of networks;
The other is a regularization loss, which ensures the consistency between the
seed areas that are generated by self-adaptive network adjustment from
different views, to overturn false activation in seed areas. Experimental
results show that USAGE consistently improves seed area generation for both
CNNs and Transformers by large margins, e.g., outperforming state-of-the-art
methods by a mIoU of 4.1% on PASCAL VOC. Moreover, based on the USAGE-generated
seed areas on Transformers, we achieve state-of-the-art WSSS results on both
PASCAL VOC and MS COCO.
Related papers
- Dual-Augmented Transformer Network for Weakly Supervised Semantic
Segmentation [4.02487511510606]
Weakly supervised semantic segmentation (WSSS) is a fundamental computer vision task, which aims to segment out the object within only class-level labels.
Traditional methods adopt the CNN-based network and utilize the class activation map (CAM) strategy to discover the object regions.
An alternative is to explore vision transformers (ViT) to encode the image to acquire the global semantic information.
We propose a dual network with both CNN-based and transformer networks for mutually complementary learning.
arXiv Detail & Related papers (2023-09-30T08:41:11Z) - Semantic-Constraint Matching Transformer for Weakly Supervised Object
Localization [31.039698757869974]
Weakly supervised object localization (WSOL) strives to learn to localize objects with only image-level supervision.
Previous CNN-based methods suffer from partial activation issues, concentrating on the object's discriminative part instead of the entire entity scope.
We propose a novel Semantic-Constraint Matching Network (SCMN) via a transformer to converge on the divergent activation.
arXiv Detail & Related papers (2023-09-04T03:20:31Z) - Generalized One-shot Domain Adaption of Generative Adversarial Networks [72.84435077616135]
The adaption of Generative Adversarial Network (GAN) aims to transfer a pre-trained GAN to a given domain with limited training data.
We consider that the adaptation from source domain to target domain can be decoupled into two parts: the transfer of global style like texture and color, and the emergence of new entities that do not belong to the source domain.
Our core objective is to constrain the gap between the internal distributions of the reference and syntheses by sliced Wasserstein distance.
arXiv Detail & Related papers (2022-09-08T09:24:44Z) - UniDAformer: Unified Domain Adaptive Panoptic Segmentation Transformer
via Hierarchical Mask Calibration [49.16591283724376]
We design UniDAformer, a unified domain adaptive panoptic segmentation transformer that is simple but can achieve domain adaptive instance segmentation and semantic segmentation simultaneously within a single network.
UniDAformer introduces Hierarchical Mask (HMC) that rectifies inaccurate predictions at the level of regions, superpixels and annotated pixels via online self-training on the fly.
It has three unique features: 1) it enables unified domain adaptive panoptic adaptation; 2) it mitigates false predictions and improves domain adaptive panoptic segmentation effectively; 3) it is end-to-end trainable with a much simpler training and inference pipeline.
arXiv Detail & Related papers (2022-06-30T07:32:23Z) - A Unified Architecture of Semantic Segmentation and Hierarchical
Generative Adversarial Networks for Expression Manipulation [52.911307452212256]
We develop a unified architecture of semantic segmentation and hierarchical GANs.
A unique advantage of our framework is that on forward pass the semantic segmentation network conditions the generative model.
We evaluate our method on two challenging facial expression translation benchmarks, AffectNet and RaFD, and a semantic segmentation benchmark, CelebAMask-HQ.
arXiv Detail & Related papers (2021-12-08T22:06:31Z) - Domain Adaptive Semantic Segmentation with Regional Contrastive
Consistency Regularization [19.279884432843822]
We propose a novel and fully end-to-end trainable approach, called regional contrastive consistency regularization (RCCR) for domain adaptive semantic segmentation.
Our core idea is to pull the similar regional features extracted from the same location of different images to be closer, and meanwhile push the features from the different locations of the two images to be separated.
arXiv Detail & Related papers (2021-10-11T11:45:00Z) - HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning [74.76431541169342]
Zero-shot learning (ZSL) tackles the unseen class recognition problem, transferring semantic knowledge from seen classes to unseen ones.
We propose a novel hierarchical semantic-visual adaptation (HSVA) framework to align semantic and visual domains.
Experiments on four benchmark datasets demonstrate HSVA achieves superior performance on both conventional and generalized ZSL.
arXiv Detail & Related papers (2021-09-30T14:27:50Z) - Gated Path Selection Network for Semantic Segmentation [72.44994579325822]
We develop a novel network named Gated Path Selection Network (GPSNet), which aims to learn adaptive receptive fields.
In GPSNet, we first design a two-dimensional multi-scale network - SuperNet, which densely incorporates features from growing receptive fields.
To dynamically select desirable semantic context, a gate prediction module is further introduced.
arXiv Detail & Related papers (2020-01-19T12:32:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.