Multi-dataset Pretraining: A Unified Model for Semantic Segmentation
- URL: http://arxiv.org/abs/2106.04121v1
- Date: Tue, 8 Jun 2021 06:13:11 GMT
- Title: Multi-dataset Pretraining: A Unified Model for Semantic Segmentation
- Authors: Bowen Shi, Xiaopeng Zhang, Haohang Xu, Wenrui Dai, Junni Zou, Hongkai
Xiong, Qi Tian
- Abstract summary: We propose a unified framework, termed as Multi-Dataset Pretraining, to take full advantage of the fragmented annotations of different datasets.
This is achieved by first pretraining the network via the proposed pixel-to-prototype contrastive loss over multiple datasets.
In order to better model the relationship among images and classes from different datasets, we extend the pixel level embeddings via cross dataset mixing.
- Score: 97.61605021985062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Collecting annotated data for semantic segmentation is time-consuming and
hard to scale up. In this paper, we for the first time propose a unified
framework, termed as Multi-Dataset Pretraining, to take full advantage of the
fragmented annotations of different datasets. The highlight is that the
annotations from different domains can be efficiently reused and consistently
boost performance for each specific domain. This is achieved by first
pretraining the network via the proposed pixel-to-prototype contrastive loss
over multiple datasets regardless of their taxonomy labels, and followed by
fine-tuning the pretrained model over specific dataset as usual. In order to
better model the relationship among images and classes from different datasets,
we extend the pixel level embeddings via cross dataset mixing and propose a
pixel-to-class sparse coding strategy that explicitly models the pixel-class
similarity over the manifold embedding space. In this way, we are able to
increase intra-class compactness and inter-class separability, as well as
considering inter-class similarity across different datasets for better
transferability. Experiments conducted on several benchmarks demonstrate its
superior performance. Notably, MDP consistently outperforms the pretrained
models over ImageNet by a considerable margin, while only using less than 10%
samples for pretraining.
Related papers
- DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion
Models [61.906934570771256]
We present a generic dataset generation model that can produce diverse synthetic images and perception annotations.
Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation.
We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module.
arXiv Detail & Related papers (2023-08-11T14:38:11Z) - Class-level Multiple Distributions Representation are Necessary for
Semantic Segmentation [9.796689408601775]
We introduce for the first time to describe intra-class variations by multiple distributions.
We also propose a class multiple distributions consistency strategy to construct discriminative multiple distribution representations of embedded pixels.
Our approach can be seamlessly integrated into popular segmentation frameworks FCN/PSPNet/CCNet and achieve 5.61%/1.75%/0.75% mIoU improvements on ADE20K.
arXiv Detail & Related papers (2023-03-14T16:10:36Z) - Distilling Ensemble of Explanations for Weakly-Supervised Pre-Training
of Image Segmentation Models [54.49581189337848]
We propose a method to enable the end-to-end pre-training for image segmentation models based on classification datasets.
The proposed method leverages a weighted segmentation learning procedure to pre-train the segmentation network en masse.
Experiment results show that, with ImageNet accompanied by PSSL as the source dataset, the proposed end-to-end pre-training strategy successfully boosts the performance of various segmentation models.
arXiv Detail & Related papers (2022-07-04T13:02:32Z) - Scaling up Multi-domain Semantic Segmentation with Sentence Embeddings [81.09026586111811]
We propose an approach to semantic segmentation that achieves state-of-the-art supervised performance when applied in a zero-shot setting.
This is achieved by replacing each class label with a vector-valued embedding of a short paragraph that describes the class.
The resulting merged semantic segmentation dataset of over 2 Million images enables training a model that achieves performance equal to that of state-of-the-art supervised methods on 7 benchmark datasets.
arXiv Detail & Related papers (2022-02-04T07:19:09Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z) - Semi-Supervised Semantic Segmentation with Pixel-Level Contrastive
Learning from a Class-wise Memory Bank [5.967279020820772]
We propose a novel representation learning module based on contrastive learning.
This module enforces the segmentation network to yield similar pixel-level feature representations for same-class samples.
In an end-to-end training, the features from both labeled and unlabeled data are optimized to be similar to same-class samples from the memory bank.
arXiv Detail & Related papers (2021-04-27T18:19:33Z) - Efficient Full Image Interactive Segmentation by Leveraging Within-image
Appearance Similarity [39.17599924322882]
We propose a new approach to interactive full-image semantic segmentation.
We leverage a key observation: propagation from labeled to unlabeled pixels does not necessarily require class-specific knowledge.
We build on this observation and propose an approach capable of jointly propagating pixel labels from multiple classes.
arXiv Detail & Related papers (2020-07-16T08:21:59Z) - Selecting Relevant Features from a Multi-domain Representation for
Few-shot Classification [91.67977602992657]
We propose a new strategy based on feature selection, which is both simpler and more effective than previous feature adaptation approaches.
We show that a simple non-parametric classifier built on top of such features produces high accuracy and generalizes to domains never seen during training.
arXiv Detail & Related papers (2020-03-20T15:44:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.