OpenMask3D: Open-Vocabulary 3D Instance Segmentation
- URL: http://arxiv.org/abs/2306.13631v2
- Date: Sun, 29 Oct 2023 14:04:25 GMT
- Title: OpenMask3D: Open-Vocabulary 3D Instance Segmentation
- Authors: Ay\c{c}a Takmaz, Elisabetta Fedele, Robert W. Sumner, Marc Pollefeys,
Federico Tombari, Francis Engelmann
- Abstract summary: OpenMask3D is a zero-shot approach for open-vocabulary 3D instance segmentation.
Our model aggregates per-mask features via multi-view fusion of CLIP-based image embeddings.
- Score: 84.58747201179654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce the task of open-vocabulary 3D instance segmentation. Current
approaches for 3D instance segmentation can typically only recognize object
categories from a pre-defined closed set of classes that are annotated in the
training datasets. This results in important limitations for real-world
applications where one might need to perform tasks guided by novel,
open-vocabulary queries related to a wide variety of objects. Recently,
open-vocabulary 3D scene understanding methods have emerged to address this
problem by learning queryable features for each point in the scene. While such
a representation can be directly employed to perform semantic segmentation,
existing methods cannot separate multiple object instances. In this work, we
address this limitation, and propose OpenMask3D, which is a zero-shot approach
for open-vocabulary 3D instance segmentation. Guided by predicted
class-agnostic 3D instance masks, our model aggregates per-mask features via
multi-view fusion of CLIP-based image embeddings. Experiments and ablation
studies on ScanNet200 and Replica show that OpenMask3D outperforms other
open-vocabulary methods, especially on the long-tail distribution. Qualitative
experiments further showcase OpenMask3D's ability to segment object properties
based on free-form queries describing geometry, affordances, and materials.
Related papers
- Search3D: Hierarchical Open-Vocabulary 3D Segmentation [78.47704793095669]
Open-vocabulary 3D segmentation enables the exploration of 3D spaces using free-form text descriptions.
We introduce Search3D, an approach that builds a hierarchical open-vocabulary 3D scene representation.
Our method aims to expand the capabilities of open vocabulary instance-level 3D segmentation by shifting towards a more flexible open-vocabulary 3D search setting.
arXiv Detail & Related papers (2024-09-27T03:44:07Z) - Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant [11.416392706435415]
We introduce the first method to address 3D instance segmentation in a vocabulary-free setting.
We leverage a large vision-language assistant and an open-vocabulary 2D instance segmenter to discover and ground semantic categories.
We evaluate our method using ScanNet200 and Replica, outperforming existing methods in both vocabulary-free and open-vocabulary settings.
arXiv Detail & Related papers (2024-08-20T08:46:54Z) - Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance [49.14140194332482]
We introduce Open3DIS, a novel solution designed to tackle the problem of Open-Vocabulary Instance within 3D scenes.
Objects within 3D environments exhibit diverse shapes, scales, and colors, making precise instance-level identification a challenging task.
arXiv Detail & Related papers (2023-12-17T10:07:03Z) - SAI3D: Segment Any Instance in 3D Scenes [68.57002591841034]
We introduce SAI3D, a novel zero-shot 3D instance segmentation approach.
Our method partitions a 3D scene into geometric primitives, which are then progressively merged into 3D instance segmentations.
Empirical evaluations on ScanNet, Matterport3D and the more challenging ScanNet++ datasets demonstrate the superiority of our approach.
arXiv Detail & Related papers (2023-12-17T09:05:47Z) - OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation [32.508069732371105]
OpenIns3D is a new 3D-input-only framework for 3D open-vocabulary scene understanding.
It achieves state-of-the-art performance across a wide range of 3D open-vocabulary tasks.
arXiv Detail & Related papers (2023-09-01T17:59:56Z) - Weakly Supervised 3D Open-vocabulary Segmentation [104.07740741126119]
We tackle the challenges in 3D open-vocabulary segmentation by exploiting pre-trained foundation models CLIP and DINO in a weakly supervised manner.
We distill the open-vocabulary multimodal knowledge and object reasoning capability of CLIP and DINO into a neural radiance field (NeRF)
A notable aspect of our approach is that it does not require any manual segmentation annotations for either the foundation models or the distillation process.
arXiv Detail & Related papers (2023-05-23T14:16:49Z) - OpenScene: 3D Scene Understanding with Open Vocabularies [73.1411930820683]
Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a model for a single task with supervision.
We propose OpenScene, an alternative approach where a model predicts dense features for 3D scene points that are co-embedded with text and image pixels in CLIP feature space.
This zero-shot approach enables task-agnostic training and open-vocabulary queries.
arXiv Detail & Related papers (2022-11-28T18:58:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.