Find it if You Can: End-to-End Adversarial Erasing for Weakly-Supervised
Semantic Segmentation
- URL: http://arxiv.org/abs/2011.04626v1
- Date: Mon, 9 Nov 2020 18:35:35 GMT
- Title: Find it if You Can: End-to-End Adversarial Erasing for Weakly-Supervised
Semantic Segmentation
- Authors: Erik Stammes, Tom F.H. Runia, Michael Hofmann, Mohsen Ghafoorian
- Abstract summary: We propose a novel formulation of adversarial erasing of the attention maps.
The proposed solution does not require saliency masks, instead it uses a regularization loss to prevent the attention maps from spreading to less discriminative object regions.
Our experiments on the Pascal VOC dataset demonstrate that our adversarial approach increases segmentation performance by 2.1 mIoU compared to our baseline and by 1.0 mIoU compared to previous adversarial erasing approaches.
- Score: 6.326017213490535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic segmentation is a task that traditionally requires a large dataset
of pixel-level ground truth labels, which is time-consuming and expensive to
obtain. Recent advancements in the weakly-supervised setting show that
reasonable performance can be obtained by using only image-level labels.
Classification is often used as a proxy task to train a deep neural network
from which attention maps are extracted. However, the classification task needs
only the minimum evidence to make predictions, hence it focuses on the most
discriminative object regions. To overcome this problem, we propose a novel
formulation of adversarial erasing of the attention maps. In contrast to
previous adversarial erasing methods, we optimize two networks with opposing
loss functions, which eliminates the requirement of certain suboptimal
strategies; for instance, having multiple training steps that complicate the
training process or a weight sharing policy between networks operating on
different distributions that might be suboptimal for performance. The proposed
solution does not require saliency masks, instead it uses a regularization loss
to prevent the attention maps from spreading to less discriminative object
regions. Our experiments on the Pascal VOC dataset demonstrate that our
adversarial approach increases segmentation performance by 2.1 mIoU compared to
our baseline and by 1.0 mIoU compared to previous adversarial erasing
approaches.
Related papers
- Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - Open-Vocabulary Semantic Segmentation with Decoupled One-Pass Network [26.97153244517095]
We propose a network that only needs a single pass through the visual-language model for each input image.
We first propose a novel network adaptation approach, termed patch severance, to restrict the harmful interference between the patch embeddings in the pre-trained visual encoder.
We then propose classification anchor learning to encourage the network to spatially focus on more discriminative features for classification.
arXiv Detail & Related papers (2023-04-03T17:59:21Z) - Flip Learning: Erase to Segment [65.84901344260277]
Weakly-supervised segmentation (WSS) can help reduce time-consuming and cumbersome manual annotation.
We propose a novel and general WSS framework called Flip Learning, which only needs the box annotation.
Our proposed approach achieves competitive performance and shows great potential to narrow the gap between fully-supervised and weakly-supervised learning.
arXiv Detail & Related papers (2021-08-02T09:56:10Z) - Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised
Semantic Segmentation [88.49669148290306]
We propose a novel weakly supervised multi-task framework called AuxSegNet to leverage saliency detection and multi-label image classification as auxiliary tasks.
Inspired by their similar structured semantics, we also propose to learn a cross-task global pixel-level affinity map from the saliency and segmentation representations.
The learned cross-task affinity can be used to refine saliency predictions and propagate CAM maps to provide improved pseudo labels for both tasks.
arXiv Detail & Related papers (2021-07-25T11:39:58Z) - Standardized Max Logits: A Simple yet Effective Approach for Identifying
Unexpected Road Obstacles in Urban-Scene Segmentation [18.666365568765098]
We propose a simple yet effective approach that standardizes the max logits in order to align the different distributions and reflect the relative meanings of max logits within each predicted class.
Our method achieves a new state-of-the-art performance on the publicly available Fishyscapes Lost & Found leaderboard with a large margin.
arXiv Detail & Related papers (2021-07-23T14:25:02Z) - Unsupervised Image Segmentation by Mutual Information Maximization and
Adversarial Regularization [7.165364364478119]
We propose a novel fully unsupervised semantic segmentation method, the so-called Information Maximization and Adrial Regularization (InMARS)
Inspired by human perception which parses a scene into perceptual groups, our proposed approach first partitions an input image into meaningful regions (also known as superpixels)
Next, it utilizes Mutual-Information-Maximization followed by an adversarial training strategy to cluster these regions into semantically meaningful classes.
Our experiments demonstrate that our method achieves the state-of-the-art performance on two commonly used unsupervised semantic segmentation datasets.
arXiv Detail & Related papers (2021-07-01T18:36:27Z) - Unsupervised Semantic Segmentation by Contrasting Object Mask Proposals [78.12377360145078]
We introduce a novel two-step framework that adopts a predetermined prior in a contrastive optimization objective to learn pixel embeddings.
This marks a large deviation from existing works that relied on proxy tasks or end-to-end clustering.
In particular, when fine-tuning the learned representations using just 1% of labeled examples on PASCAL, we outperform supervised ImageNet pre-training by 7.1% mIoU.
arXiv Detail & Related papers (2021-02-11T18:54:47Z) - Pixel-Level Cycle Association: A New Perspective for Domain Adaptive
Semantic Segmentation [169.82760468633236]
We propose to build the pixel-level cycle association between source and target pixel pairs.
Our method can be trained end-to-end in one stage and introduces no additional parameters.
arXiv Detail & Related papers (2020-10-31T00:11:36Z) - Mixup-CAM: Weakly-supervised Semantic Segmentation via Uncertainty
Regularization [73.03956876752868]
We propose a principled and end-to-end train-able framework to allow the network to pay attention to other parts of the object.
Specifically, we introduce the mixup data augmentation scheme into the classification network and design two uncertainty regularization terms to better interact with the mixup strategy.
arXiv Detail & Related papers (2020-08-03T21:19:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.