Language-Grounded Indoor 3D Semantic Segmentation in the Wild
- URL: http://arxiv.org/abs/2204.07761v1
- Date: Sat, 16 Apr 2022 09:17:40 GMT
- Title: Language-Grounded Indoor 3D Semantic Segmentation in the Wild
- Authors: David Rozenberszki, Or Litany, Angela Dai
- Abstract summary: We study a larger vocabulary for 3D semantic segmentation with a new extended benchmark on ScanNet data with 200 class categories.
We propose a language-driven pre-training method to encourage learned 3D features to lie close to their pre-trained text embeddings.
Our approach consistently outperforms state-of-the-art 3D pre-training for 3D semantic segmentation on our proposed benchmark.
- Score: 33.40572976383402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in 3D semantic segmentation with deep neural networks have
shown remarkable success, with rapid performance increase on available
datasets. However, current 3D semantic segmentation benchmarks contain only a
small number of categories -- less than 30 for ScanNet and SemanticKITTI, for
instance, which are not enough to reflect the diversity of real environments
(e.g., semantic image understanding covers hundreds to thousands of classes).
Thus, we propose to study a larger vocabulary for 3D semantic segmentation with
a new extended benchmark on ScanNet data with 200 class categories, an order of
magnitude more than previously studied. This large number of class categories
also induces a large natural class imbalance, both of which are challenging for
existing 3D semantic segmentation methods. To learn more robust 3D features in
this context, we propose a language-driven pre-training method to encourage
learned 3D features that might have limited training examples to lie close to
their pre-trained text embeddings. Extensive experiments show that our approach
consistently outperforms state-of-the-art 3D pre-training for 3D semantic
segmentation on our proposed benchmark (+9% relative mIoU), including
limited-data scenarios with +25% relative mIoU using only 5% annotations.
Related papers
- Bayesian Self-Training for Semi-Supervised 3D Segmentation [59.544558398992386]
3D segmentation is a core problem in computer vision.
densely labeling 3D point clouds to employ fully-supervised training remains too labor intensive and expensive.
Semi-supervised training provides a more practical alternative, where only a small set of labeled data is given, accompanied by a larger unlabeled set.
arXiv Detail & Related papers (2024-09-12T14:54:31Z) - Augmented Efficiency: Reducing Memory Footprint and Accelerating Inference for 3D Semantic Segmentation through Hybrid Vision [9.96433151449016]
This paper introduces a novel approach to 3D semantic segmentation, distinguished by incorporating a hybrid blend of 2D and 3D computer vision techniques.
We conduct 2D semantic segmentation on RGB images linked to 3D point clouds and extend the results to 3D using an extrusion technique for specific class labels.
This model serves as the current state-of-the-art 3D semantic segmentation model on the KITTI-360 dataset.
arXiv Detail & Related papers (2024-07-23T00:04:10Z) - SegPoint: Segment Any Point Cloud via Large Language Model [62.69797122055389]
We propose a model, called SegPoint, to produce point-wise segmentation masks across a diverse range of tasks.
SegPoint is the first model to address varied segmentation tasks within a single framework.
arXiv Detail & Related papers (2024-07-18T17:58:03Z) - Segment Any 3D Object with Language [58.471327490684295]
We introduce Segment any 3D Object with LanguagE (SOLE), a semantic geometric and-aware visual-language learning framework with strong generalizability.
Specifically, we propose a multimodal fusion network to incorporate multimodal semantics in both backbone and decoder.
Our SOLE outperforms previous methods by a large margin on ScanNetv2, ScanNet200, and Replica benchmarks.
arXiv Detail & Related papers (2024-04-02T17:59:10Z) - SAI3D: Segment Any Instance in 3D Scenes [68.57002591841034]
We introduce SAI3D, a novel zero-shot 3D instance segmentation approach.
Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations.
Empirical evaluations on ScanNet, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach.
arXiv Detail & Related papers (2023-12-17T09:05:47Z) - Lowis3D: Language-Driven Open-World Instance-Level 3D Scene
Understanding [57.47315482494805]
Open-world instance-level scene understanding aims to locate and recognize unseen object categories that are not present in the annotated dataset.
This task is challenging because the model needs to both localize novel 3D objects and infer their semantic categories.
We propose to harness pre-trained vision-language (VL) foundation models that encode extensive knowledge from image-text pairs to generate captions for 3D scenes.
arXiv Detail & Related papers (2023-08-01T07:50:14Z) - ULIP: Learning a Unified Representation of Language, Images, and Point
Clouds for 3D Understanding [110.07170245531464]
Current 3D models are limited by datasets with a small number of annotated data and a pre-defined set of categories.
Recent advances have shown that similar problems can be significantly alleviated by employing knowledge from other modalities, such as language.
We learn a unified representation of images, texts, and 3D point clouds by pre-training with object triplets from the three modalities.
arXiv Detail & Related papers (2022-12-10T01:34:47Z) - Spatial Semantic Embedding Network: Fast 3D Instance Segmentation with
Deep Metric Learning [5.699350798684963]
We propose a simple, yet efficient algorithm for 3D instance segmentation using deep metric learning.
For high-level intelligent tasks from a large scale scene, 3D instance segmentation recognizes individual instances of objects.
We demonstrate the state-of-the-art performance of our algorithm in the ScanNet 3D instance segmentation benchmark on AP score.
arXiv Detail & Related papers (2020-07-07T02:17:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.