MSeg: A Composite Dataset for Multi-domain Semantic Segmentation
- URL: http://arxiv.org/abs/2112.13762v1
- Date: Mon, 27 Dec 2021 16:16:35 GMT
- Title: MSeg: A Composite Dataset for Multi-domain Semantic Segmentation
- Authors: John Lambert, Zhuang Liu, Ozan Sener, James Hays, Vladlen Koltun
- Abstract summary: We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
- Score: 100.17755160696939
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We present MSeg, a composite dataset that unifies semantic segmentation
datasets from different domains. A naive merge of the constituent datasets
yields poor performance due to inconsistent taxonomies and annotation
practices. We reconcile the taxonomies and bring the pixel-level annotations
into alignment by relabeling more than 220,000 object masks in more than 80,000
images, requiring more than 1.34 years of collective annotator effort. The
resulting composite dataset enables training a single semantic segmentation
model that functions effectively across domains and generalizes to datasets
that were not seen during training. We adopt zero-shot cross-dataset transfer
as a benchmark to systematically evaluate a model's robustness and show that
MSeg training yields substantially more robust models in comparison to training
on individual datasets or naive mixing of datasets without the presented
contributions. A model trained on MSeg ranks first on the WildDash-v1
leaderboard for robust semantic segmentation, with no exposure to WildDash data
during training. We evaluate our models in the 2020 Robust Vision Challenge
(RVC) as an extreme generalization experiment. MSeg training sets include only
three of the seven datasets in the RVC; more importantly, the evaluation
taxonomy of RVC is different and more detailed. Surprisingly, our model shows
competitive performance and ranks second. To evaluate how close we are to the
grand aim of robust, efficient, and complete scene understanding, we go beyond
semantic segmentation by training instance segmentation and panoptic
segmentation models using our dataset. Moreover, we also evaluate various
engineering design decisions and metrics, including resolution and
computational efficiency. Although our models are far from this grand aim, our
comprehensive evaluation is crucial for progress. We share all the models and
code with the community.
Related papers
- Automated Label Unification for Multi-Dataset Semantic Segmentation with GNNs [48.406728896785296]
We propose a novel approach to automatically construct a unified label space across multiple datasets using graph neural networks.
Unlike existing methods, our approach facilitates seamless training without the need for additional manual reannotation or taxonomy reconciliation.
arXiv Detail & Related papers (2024-07-15T08:42:10Z) - TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation [48.75470418596875]
Training on large-scale datasets can boost the performance of video instance segmentation while the datasets for VIS are hard to scale up due to the high labor cost.
What we possess are numerous isolated filed-specific datasets, thus, it is appealing to jointly train models across the aggregation of datasets to enhance data volume and diversity.
We conduct extensive evaluations on four popular and challenging benchmarks, including YouTube-VIS 2019, YouTube-VIS 2021, OVIS, and UVO.
Our model shows significant improvement over the baseline solutions, and sets new state-of-the-art records on all benchmarks.
arXiv Detail & Related papers (2023-12-11T18:50:09Z) - Semantic-SAM: Segment and Recognize Anything at Any Granularity [83.64686655044765]
We introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity.
We consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts.
For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels.
arXiv Detail & Related papers (2023-07-10T17:59:40Z) - What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation [2.7036595757881323]
We build a benchmark for Multi-domain Evaluation of Semantic (MESS)
MESS allows a holistic analysis of performance across a wide range of domain-specific datasets.
We evaluate eight recently published models on the proposed MESS benchmark and analyze characteristics for the performance of zero-shot transfer models.
arXiv Detail & Related papers (2023-06-27T14:47:43Z) - Text2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models [7.452422412106768]
We propose a novel method named Text2Seg for remote sensing semantic segmentation.
It overcomes the dependency on extensive annotations by employing an automatic prompt generation process.
We show that Text2Seg significantly improves zero-shot prediction performance compared to the vanilla SAM model.
arXiv Detail & Related papers (2023-04-20T18:39:41Z) - An Empirical Study on Multi-Domain Robust Semantic Segmentation [42.79166534691889]
We train a unified model that is expected to perform well across domains on several popularity segmentation datasets.
Our solution ranks 2nd on RVC 2022 semantic segmentation task, with a dataset only 1/3 size of the 1st model used.
arXiv Detail & Related papers (2022-12-08T12:04:01Z) - Multi-dataset Pretraining: A Unified Model for Semantic Segmentation [97.61605021985062]
We propose a unified framework, termed as Multi-Dataset Pretraining, to take full advantage of the fragmented annotations of different datasets.
This is achieved by first pretraining the network via the proposed pixel-to-prototype contrastive loss over multiple datasets.
In order to better model the relationship among images and classes from different datasets, we extend the pixel level embeddings via cross dataset mixing.
arXiv Detail & Related papers (2021-06-08T06:13:11Z) - Improving Zero and Few-Shot Abstractive Summarization with Intermediate
Fine-tuning and Data Augmentation [101.26235068460551]
Models pretrained with self-supervised objectives on large text corpora achieve state-of-the-art performance on English text summarization tasks.
Models are typically fine-tuned on hundreds of thousands of data points, an infeasible requirement when applying summarization to new, niche domains.
We introduce a novel and generalizable method, called WikiTransfer, for fine-tuning pretrained models for summarization in an unsupervised, dataset-specific manner.
arXiv Detail & Related papers (2020-10-24T08:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.