Distilling Ensemble of Explanations for Weakly-Supervised Pre-Training
of Image Segmentation Models
- URL: http://arxiv.org/abs/2207.03335v1
- Date: Mon, 4 Jul 2022 13:02:32 GMT
- Title: Distilling Ensemble of Explanations for Weakly-Supervised Pre-Training
of Image Segmentation Models
- Authors: Xuhong Li, Haoyi Xiong, Yi Liu, Dingfu Zhou, Zeyu Chen, Yaqing Wang,
Dejing Dou
- Abstract summary: We propose a method to enable the end-to-end pre-training for image segmentation models based on classification datasets.
The proposed method leverages a weighted segmentation learning procedure to pre-train the segmentation network en masse.
Experiment results show that, with ImageNet accompanied by PSSL as the source dataset, the proposed end-to-end pre-training strategy successfully boosts the performance of various segmentation models.
- Score: 54.49581189337848
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While fine-tuning pre-trained networks has become a popular way to train
image segmentation models, such backbone networks for image segmentation are
frequently pre-trained using image classification source datasets, e.g.,
ImageNet. Though image classification datasets could provide the backbone
networks with rich visual features and discriminative ability, they are
incapable of fully pre-training the target model (i.e., backbone+segmentation
modules) in an end-to-end manner. The segmentation modules are left to random
initialization in the fine-tuning process due to the lack of segmentation
labels in classification datasets. In our work, we propose a method that
leverages Pseudo Semantic Segmentation Labels (PSSL), to enable the end-to-end
pre-training for image segmentation models based on classification datasets.
PSSL was inspired by the observation that the explanation results of
classification models, obtained through explanation algorithms such as CAM,
SmoothGrad and LIME, would be close to the pixel clusters of visual objects.
Specifically, PSSL is obtained for each image by interpreting the
classification results and aggregating an ensemble of explanations queried from
multiple classifiers to lower the bias caused by single models. With PSSL for
every image of ImageNet, the proposed method leverages a weighted segmentation
learning procedure to pre-train the segmentation network en masse. Experiment
results show that, with ImageNet accompanied by PSSL as the source dataset, the
proposed end-to-end pre-training strategy successfully boosts the performance
of various segmentation models, i.e., PSPNet-ResNet50, DeepLabV3-ResNet50, and
OCRNet-HRNetW18, on a number of segmentation tasks, such as CamVid, VOC-A,
VOC-C, ADE20K, and CityScapes, with significant improvements. The source code
is availabel at https://github.com/PaddlePaddle/PaddleSeg.
Related papers
- UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation [64.01742988773745]
An increasing privacy concern exists regarding training large-scale image segmentation models on unauthorized private data.
We exploit the concept of unlearnable examples to make images unusable to model training by generating and adding unlearnable noise into the original images.
We empirically verify the effectiveness of UnSeg across 6 mainstream image segmentation tasks, 10 widely used datasets, and 7 different network architectures.
arXiv Detail & Related papers (2024-10-13T16:34:46Z) - SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary
Semantic Segmentation [26.079055078561986]
We propose a CLIP-based model named SegCLIP for the topic of open-vocabulary segmentation.
The main idea is to gather patches with learnable centers to semantic regions through training on text-image pairs.
Experimental results show that our model achieves comparable or superior segmentation accuracy.
arXiv Detail & Related papers (2022-11-27T12:38:52Z) - Remote Sensing Images Semantic Segmentation with General Remote Sensing
Vision Model via a Self-Supervised Contrastive Learning Method [13.479068312825781]
We propose Global style and Local matching Contrastive Learning Network (GLCNet) for remote sensing semantic segmentation.
Specifically, the global style contrastive module is used to learn an image-level representation better.
The local features matching contrastive module is designed to learn representations of local regions which is beneficial for semantic segmentation.
arXiv Detail & Related papers (2021-06-20T03:03:40Z) - Multi-dataset Pretraining: A Unified Model for Semantic Segmentation [97.61605021985062]
We propose a unified framework, termed as Multi-Dataset Pretraining, to take full advantage of the fragmented annotations of different datasets.
This is achieved by first pretraining the network via the proposed pixel-to-prototype contrastive loss over multiple datasets.
In order to better model the relationship among images and classes from different datasets, we extend the pixel level embeddings via cross dataset mixing.
arXiv Detail & Related papers (2021-06-08T06:13:11Z) - Half-Real Half-Fake Distillation for Class-Incremental Semantic
Segmentation [84.1985497426083]
convolutional neural networks are ill-equipped for incremental learning.
New classes are available but the initial training data is not retained.
We try to address this issue by "inverting" the trained segmentation network to synthesize input images starting from random noise.
arXiv Detail & Related papers (2021-04-02T03:47:16Z) - Recursive Training for Zero-Shot Semantic Segmentation [26.89352005206994]
We propose a training scheme to supervise the retraining of a semantic segmentation model for a zero-shot setting.
We show that our proposed model achieves state-of-the-art performance on the Pascal-VOC 2012 dataset and Pascal-Context dataset.
arXiv Detail & Related papers (2021-02-26T23:44:16Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - CRNet: Cross-Reference Networks for Few-Shot Segmentation [59.85183776573642]
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images.
With a cross-reference mechanism, our network can better find the co-occurrent objects in the two images.
Experiments on the PASCAL VOC 2012 dataset show that our network achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-03-24T04:55:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.