Dual-Augmented Transformer Network for Weakly Supervised Semantic
Segmentation
- URL: http://arxiv.org/abs/2310.00307v1
- Date: Sat, 30 Sep 2023 08:41:11 GMT
- Title: Dual-Augmented Transformer Network for Weakly Supervised Semantic
Segmentation
- Authors: Jingliang Deng, Zonghan Li
- Abstract summary: Weakly supervised semantic segmentation (WSSS) is a fundamental computer vision task, which aims to segment out the object within only class-level labels.
Traditional methods adopt the CNN-based network and utilize the class activation map (CAM) strategy to discover the object regions.
An alternative is to explore vision transformers (ViT) to encode the image to acquire the global semantic information.
We propose a dual network with both CNN-based and transformer networks for mutually complementary learning.
- Score: 4.02487511510606
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly supervised semantic segmentation (WSSS), a fundamental computer vision
task, which aims to segment out the object within only class-level labels. The
traditional methods adopt the CNN-based network and utilize the class
activation map (CAM) strategy to discover the object regions. However, such
methods only focus on the most discriminative region of the object, resulting
in incomplete segmentation. An alternative is to explore vision transformers
(ViT) to encode the image to acquire the global semantic information. Yet, the
lack of transductive bias to objects is a flaw of ViT. In this paper, we
explore the dual-augmented transformer network with self-regularization
constraints for WSSS. Specifically, we propose a dual network with both
CNN-based and transformer networks for mutually complementary learning, where
both networks augment the final output for enhancement. Massive systemic
evaluations on the challenging PASCAL VOC 2012 benchmark demonstrate the
effectiveness of our method, outperforming previous state-of-the-art methods.
Related papers
- Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - Semantic-Constraint Matching Transformer for Weakly Supervised Object
Localization [31.039698757869974]
Weakly supervised object localization (WSOL) strives to learn to localize objects with only image-level supervision.
Previous CNN-based methods suffer from partial activation issues, concentrating on the object's discriminative part instead of the entire entity scope.
We propose a novel Semantic-Constraint Matching Network (SCMN) via a transformer to converge on the divergent activation.
arXiv Detail & Related papers (2023-09-04T03:20:31Z) - USAGE: A Unified Seed Area Generation Paradigm for Weakly Supervised
Semantic Segmentation [90.08744714206233]
We propose a Unified optimization paradigm for Seed Area GEneration (USAGE) for both types of networks.
Experimental results show that USAGE consistently improves seed area generation for both CNNs and Transformers.
arXiv Detail & Related papers (2023-03-14T11:25:02Z) - Representation Separation for Semantic Segmentation with Vision
Transformers [11.431694321563322]
Vision transformers (ViTs) encoding an image as a sequence of patches bring new paradigms for semantic segmentation.
We present an efficient framework of representation separation in local-patch level and global-region level for semantic segmentation with ViTs.
arXiv Detail & Related papers (2022-12-28T09:54:52Z) - Dual Progressive Transformations for Weakly Supervised Semantic
Segmentation [23.68115323096787]
Weakly supervised semantic segmentation (WSSS) is a challenging task in computer vision.
We propose a Convolutional Neural Networks Refined Transformer (CRT) to mine a globally complete and locally accurate class activation maps.
Our proposed CRT achieves the new state-of-the-art performance on both the weakly supervised semantic segmentation task.
arXiv Detail & Related papers (2022-09-30T03:42:52Z) - WegFormer: Transformers for Weakly Supervised Semantic Segmentation [32.3201557200616]
This work introduces Transformer to build a simple and effective WSSS framework, termed WegFormer.
Unlike existing CNN-based methods, WegFormer uses Vision Transformer as a classifier to produce high-quality pseudo segmentation masks.
WegFormer achieves state-of-the-art 70.5% mIoU on the PASCAL VOC dataset, significantly outperforming the previous best method.
arXiv Detail & Related papers (2022-03-16T06:50:31Z) - A Unified Architecture of Semantic Segmentation and Hierarchical
Generative Adversarial Networks for Expression Manipulation [52.911307452212256]
We develop a unified architecture of semantic segmentation and hierarchical GANs.
A unique advantage of our framework is that on forward pass the semantic segmentation network conditions the generative model.
We evaluate our method on two challenging facial expression translation benchmarks, AffectNet and RaFD, and a semantic segmentation benchmark, CelebAMask-HQ.
arXiv Detail & Related papers (2021-12-08T22:06:31Z) - Efficient Hybrid Transformer: Learning Global-local Context for Urban
Sence Segmentation [11.237929167356725]
We propose an efficient hybrid Transformer (EHT) for semantic segmentation of urban scene images.
EHT takes advantage of CNNs and Transformer, learning global-local context to strengthen the feature representation.
The proposed EHT achieves a 67.0% mIoU on the UAVid test set and outperforms other lightweight models significantly.
arXiv Detail & Related papers (2021-09-18T13:55:38Z) - Context Decoupling Augmentation for Weakly Supervised Semantic
Segmentation [53.49821324597837]
Weakly supervised semantic segmentation is a challenging problem that has been deeply studied in recent years.
We present a Context Decoupling Augmentation ( CDA) method to change the inherent context in which the objects appear.
To validate the effectiveness of the proposed method, extensive experiments on PASCAL VOC 2012 dataset with several alternative network architectures demonstrate that CDA can boost various popular WSSS methods to the new state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-03-02T15:05:09Z) - A Transductive Multi-Head Model for Cross-Domain Few-Shot Learning [72.30054522048553]
We present a new method, Transductive Multi-Head Few-Shot learning (TMHFS), to address the Cross-Domain Few-Shot Learning challenge.
The proposed methods greatly outperform the strong baseline, fine-tuning, on four different target domains.
arXiv Detail & Related papers (2020-06-08T02:39:59Z) - Ventral-Dorsal Neural Networks: Object Detection via Selective Attention [51.79577908317031]
We propose a new framework called Ventral-Dorsal Networks (VDNets)
Inspired by the structure of the human visual system, we propose the integration of a "Ventral Network" and a "Dorsal Network"
Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches.
arXiv Detail & Related papers (2020-05-15T23:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.