Scaling Semantic Segmentation Beyond 1K Classes on a Single GPU
- URL: http://arxiv.org/abs/2012.07489v2
- Date: Thu, 8 Apr 2021 16:38:34 GMT
- Title: Scaling Semantic Segmentation Beyond 1K Classes on a Single GPU
- Authors: Shipra Jain, Danda Paudel Pani, Martin Danelljan, Luc Van Gool
- Abstract summary: We propose a novel training methodology to train and scale the existing semantic segmentation models.
We demonstrate a clear benefit of our approach on a dataset with 1284 classes, bootstrapped from LVIS and COCO annotations, with three times better mIoU than the DeeplabV3+ model.
- Score: 87.48110331544885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The state-of-the-art object detection and image classification methods can
perform impressively on more than 9k and 10k classes, respectively. In
contrast, the number of classes in semantic segmentation datasets is relatively
limited. This is not surprising when the restrictions caused by the lack of
labeled data and high computation demand for segmentation are considered. In
this paper, we propose a novel training methodology to train and scale the
existing semantic segmentation models for a large number of semantic classes
without increasing the memory overhead. In our embedding-based scalable
segmentation approach, we reduce the space complexity of the segmentation
model's output from O(C) to O(1), propose an approximation method for
ground-truth class probability, and use it to compute cross-entropy loss. The
proposed approach is general and can be adopted by any state-of-the-art
segmentation model to gracefully scale it for any number of semantic classes
with only one GPU. Our approach achieves similar, and in some cases, even
better mIoU for Cityscapes, Pascal VOC, ADE20k, COCO-Stuff10k datasets when
adopted to DeeplabV3+ model with different backbones. We demonstrate a clear
benefit of our approach on a dataset with 1284 classes, bootstrapped from LVIS
and COCO annotations, with three times better mIoU than the DeeplabV3+ model.
Related papers
- Lightweight Uncertainty Quantification with Simplex Semantic Segmentation for Terrain Traversability [12.765558639563649]
We propose a simple, light-weight module that can be connected to any pretrained image segmentation model.
Our module is based on maximum separation of the segmentation classes by respective prototype vectors.
We demonstrate the effectiveness of our module for terrain segmentation.
arXiv Detail & Related papers (2024-07-18T11:00:49Z) - LiteNeXt: A Novel Lightweight ConvMixer-based Model with Self-embedding Representation Parallel for Medical Image Segmentation [2.0901574458380403]
We propose a new lightweight but efficient model, namely LiteNeXt, for medical image segmentation.
LiteNeXt is trained from scratch with small amount of parameters (0.71M) and Giga Floating Point Operations Per Second (0.42).
arXiv Detail & Related papers (2024-04-04T01:59:19Z) - Rethinking Few-shot 3D Point Cloud Semantic Segmentation [62.80639841429669]
This paper revisits few-shot 3D point cloud semantic segmentation (FS-PCS)
We focus on two significant issues in the state-of-the-art: foreground leakage and sparse point distribution.
To address these issues, we introduce a standardized FS-PCS setting, upon which a new benchmark is built.
arXiv Detail & Related papers (2024-03-01T15:14:47Z) - Placing Objects in Context via Inpainting for Out-of-distribution Segmentation [59.00092709848619]
Placing Objects in Context (POC) is a pipeline to realistically add objects to an image.
POC can be used to extend any dataset with an arbitrary number of objects.
We present different anomaly segmentation datasets based on POC-generated data and show that POC can improve the performance of recent state-of-the-art anomaly fine-tuning methods.
arXiv Detail & Related papers (2024-02-26T08:32:41Z) - Interclass Prototype Relation for Few-Shot Segmentation [0.0]
With few-shot segmentation, the target class data distribution in the feature space is sparse and has low coverage because of the slight variations in the sample data.
This study proposes the Interclass Prototype Relation Network (IPRNet) which improves the separation performance by reducing the similarity between other classes.
arXiv Detail & Related papers (2022-11-16T05:27:52Z) - Large-Margin Representation Learning for Texture Classification [67.94823375350433]
This paper presents a novel approach combining convolutional layers (CLs) and large-margin metric learning for training supervised models on small datasets for texture classification.
The experimental results on texture and histopathologic image datasets have shown that the proposed approach achieves competitive accuracy with lower computational cost and faster convergence when compared to equivalent CNNs.
arXiv Detail & Related papers (2022-06-17T04:07:45Z) - Rethinking Semantic Segmentation: A Prototype View [126.59244185849838]
We present a nonparametric semantic segmentation model based on non-learnable prototypes.
Our framework yields compelling results over several datasets.
We expect this work will provoke a rethink of the current de facto semantic segmentation model design.
arXiv Detail & Related papers (2022-03-28T21:15:32Z) - Scaling up Multi-domain Semantic Segmentation with Sentence Embeddings [81.09026586111811]
We propose an approach to semantic segmentation that achieves state-of-the-art supervised performance when applied in a zero-shot setting.
This is achieved by replacing each class label with a vector-valued embedding of a short paragraph that describes the class.
The resulting merged semantic segmentation dataset of over 2 Million images enables training a model that achieves performance equal to that of state-of-the-art supervised methods on 7 benchmark datasets.
arXiv Detail & Related papers (2022-02-04T07:19:09Z) - EOLO: Embedded Object Segmentation only Look Once [0.0]
We introduce an anchor-free and single-shot instance segmentation method, which is conceptually simple with 3 independent branches, fully convolutional and can be used by easily embedding it into mobile and embedded devices.
Our method, refer as EOLO, reformulates the instance segmentation problem as predicting semantic segmentation and distinguishing overlapping objects problem, through instance center classification and 4D distance regression on each pixel.
Without any bells and whistles, EOLO achieves 27.7$%$ in mask mAP under IoU50 and reaches 30 FPS on 1080Ti GPU, with a single-model and single-scale training/testing on
arXiv Detail & Related papers (2020-03-31T21:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.