Improving Semantic Image Segmentation via Label Fusion in Semantically
Textured Meshes
- URL: http://arxiv.org/abs/2111.11103v1
- Date: Mon, 22 Nov 2021 10:47:32 GMT
- Title: Improving Semantic Image Segmentation via Label Fusion in Semantically
Textured Meshes
- Authors: Florian Fervers, Timo Breuer, Gregor Stachowiak, Sebastian Bullinger,
Christoph Bodensteiner, Michael Arens
- Abstract summary: We present a label fusion framework that is capable of improving semantic pixel labels of video sequences in an unsupervised manner.
We use a 3D mesh representation of the environment and fuse the predictions of different frames into a consistent representation using semantic mesh textures.
We evaluate our method on the Scannet dataset where we improve annotations produced by the state-of-the-art segmentation network ESANet from $52.05 %$ to $58.25 %$ pixel accuracy.
- Score: 10.645137380835994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Models for semantic segmentation require a large amount of hand-labeled
training data which is costly and time-consuming to produce. For this purpose,
we present a label fusion framework that is capable of improving semantic pixel
labels of video sequences in an unsupervised manner. We make use of a 3D mesh
representation of the environment and fuse the predictions of different frames
into a consistent representation using semantic mesh textures. Rendering the
semantic mesh using the original intrinsic and extrinsic camera parameters
yields a set of improved semantic segmentation images. Due to our optimized
CUDA implementation, we are able to exploit the entire $c$-dimensional
probability distribution of annotations over $c$ classes in an
uncertainty-aware manner. We evaluate our method on the Scannet dataset where
we improve annotations produced by the state-of-the-art segmentation network
ESANet from $52.05 \%$ to $58.25 \%$ pixel accuracy. We publish the source code
of our framework online to foster future research in this area
(\url{https://github.com/fferflo/semantic-meshes}). To the best of our
knowledge, this is the first publicly available label fusion framework for
semantic image segmentation based on meshes with semantic textures.
Related papers
- FuseNet: Self-Supervised Dual-Path Network for Medical Image
Segmentation [3.485615723221064]
FuseNet is a dual-stream framework for self-supervised semantic segmentation.
Cross-modal fusion technique extends the principles of CLIP by replacing textual data with augmented images.
experiments on skin lesion and lung segmentation datasets demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-11-22T00:03:16Z) - MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner
for Open-World Semantic Segmentation [110.09800389100599]
We propose MixReorg, a novel and straightforward pre-training paradigm for semantic segmentation.
Our approach involves generating fine-grained patch-text pairs data by mixing image patches while preserving the correspondence between patches and text.
With MixReorg as a mask learner, conventional text-supervised semantic segmentation models can achieve highly generalizable pixel-semantic alignment ability.
arXiv Detail & Related papers (2023-08-09T09:35:16Z) - Language-driven Semantic Segmentation [88.21498323896475]
We present LSeg, a novel model for language-driven semantic image segmentation.
We use a text encoder to compute embeddings of descriptive input labels.
The encoder is trained with a contrastive objective to align pixel embeddings to the text embedding of the corresponding semantic class.
arXiv Detail & Related papers (2022-01-10T18:59:10Z) - Reference-guided Pseudo-Label Generation for Medical Semantic
Segmentation [25.76014072179711]
We propose a novel approach to generate supervision for semi-supervised semantic segmentation.
We use a small number of labeled images as reference material and match pixels in an unlabeled image to the semantics of the best fitting pixel in a reference set.
We achieve the same performance as a standard fully supervised model on X-ray anatomy segmentation, albeit 95% fewer labeled images.
arXiv Detail & Related papers (2021-12-01T12:21:24Z) - Maximize the Exploration of Congeneric Semantics for Weakly Supervised
Semantic Segmentation [27.155133686127474]
We construct a graph neural network (P-GNN) based on the self-detected patches from different images that contain the same class labels.
We conduct experiments on the popular PASCAL VOC 2012 benchmarks, and our model yields state-of-the-art performance.
arXiv Detail & Related papers (2021-10-08T08:59:16Z) - Segmenter: Transformer for Semantic Segmentation [79.9887988699159]
We introduce Segmenter, a transformer model for semantic segmentation.
We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation.
It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.
arXiv Detail & Related papers (2021-05-12T13:01:44Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion.
In this paper, a new paradigm for semantic segmentation is proposed.
Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image.
We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z) - PCAMs: Weakly Supervised Semantic Segmentation Using Point Supervision [12.284208932393073]
This paper presents a novel procedure for producing semantic segmentation from images given some point level annotations.
We propose training a CNN that is normally fully supervised using our pseudo labels in place of ground truth labels.
Our method achieves state of the art results for point supervised semantic segmentation on the PASCAL VOC 2012 dataset citeeveringham2010pascal, even outperforming state of the art methods for stronger bounding box and squiggle supervision.
arXiv Detail & Related papers (2020-07-10T21:25:27Z) - RGB-based Semantic Segmentation Using Self-Supervised Depth Pre-Training [77.62171090230986]
We propose an easily scalable and self-supervised technique that can be used to pre-train any semantic RGB segmentation method.
In particular, our pre-training approach makes use of automatically generated labels that can be obtained using depth sensors.
We show how our proposed self-supervised pre-training with HN-labels can be used to replace ImageNet pre-training.
arXiv Detail & Related papers (2020-02-06T11:16:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.