Fast and Efficient: Mask Neural Fields for 3D Scene Segmentation
- URL: http://arxiv.org/abs/2407.01220v1
- Date: Mon, 1 Jul 2024 12:07:26 GMT
- Title: Fast and Efficient: Mask Neural Fields for 3D Scene Segmentation
- Authors: Zihan Gao, Lingling Li, Licheng Jiao, Fang Liu, Xu Liu, Wenping Ma, Yuwei Guo, Shuyuan Yang,
- Abstract summary: MaskField enables 3D open-vocabulary segmentation with neural fields under weak supervision.
It overcomes the direct handling of high-dimensional CLIP features during training.
It achieves remarkably fast convergence, outperforming previous methods with just 5 minutes of training.
- Score: 47.08813064337934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding 3D scenes is a crucial challenge in computer vision research with applications spanning multiple domains. Recent advancements in distilling 2D vision-language foundation models into neural fields, like NeRF and 3DGS, enables open-vocabulary segmentation of 3D scenes from 2D multi-view images without the need for precise 3D annotations. While effective, however, the per-pixel distillation of high-dimensional CLIP features introduces ambiguity and necessitates complex regularization strategies, adding inefficiencies during training. This paper presents MaskField, which enables fast and efficient 3D open-vocabulary segmentation with neural fields under weak supervision. Unlike previous methods, MaskField distills masks rather than dense high-dimensional CLIP features. MaskFields employ neural fields as binary mask generators and supervise them with masks generated by SAM and classified by coarse CLIP features. MaskField overcomes the ambiguous object boundaries by naturally introducing SAM segmented object shapes without extra regularization during training. By circumventing the direct handling of high-dimensional CLIP features during training, MaskField is particularly compatible with explicit scene representations like 3DGS. Our extensive experiments show that MaskField not only surpasses prior state-of-the-art methods but also achieves remarkably fast convergence, outperforming previous methods with just 5 minutes of training. We hope that MaskField will inspire further exploration into how neural fields can be trained to comprehend 3D scenes from 2D models.
Related papers
- PLGS: Robust Panoptic Lifting with 3D Gaussian Splatting [16.333566122541022]
We propose a new method called PLGS that enables 3DGS to generate consistent panoptic segmentation masks from noisy 2D segmentation masks.
Our method outperforms previous state-of-the-art methods in terms of both segmentation quality and speed.
arXiv Detail & Related papers (2024-10-23T02:05:05Z) - Triple Point Masking [49.39218611030084]
Existing 3D mask learning methods encounter performance bottlenecks under limited data.
We introduce a triple point masking scheme, named TPM, which serves as a scalable framework for pre-training of masked autoencoders.
Extensive experiments show that the four baselines equipped with the proposed TPM achieve comprehensive performance improvements on various downstream tasks.
arXiv Detail & Related papers (2024-09-26T05:33:30Z) - EmbodiedSAM: Online Segment Any 3D Thing in Real Time [61.2321497708998]
Embodied tasks require the agent to fully understand 3D scenes simultaneously with its exploration.
An online, real-time, fine-grained and highly-generalized 3D perception model is desperately needed.
arXiv Detail & Related papers (2024-08-21T17:57:06Z) - DiscoNeRF: Class-Agnostic Object Field for 3D Object Discovery [46.711276257688326]
NeRFs have become a powerful tool for modeling 3D scenes from multiple images.
Previous approaches to 3D segmentation of NeRFs either require user interaction to isolate a single object, or they rely on 2D semantic masks with a limited number of classes for supervision.
We propose a method that is robust to inconsistent segmentations and successfully decomposes the scene into a set of objects of any class.
arXiv Detail & Related papers (2024-08-19T12:07:24Z) - Efficient 3D Instance Mapping and Localization with Neural Fields [39.73128916618561]
We tackle the problem of learning an implicit scene representation for 3D instance segmentation from a sequence of posed RGB images.
We introduce 3DIML, a novel framework that efficiently learns a neural label field which can render 3D instance segmentation masks from novel viewpoints.
arXiv Detail & Related papers (2024-03-28T19:25:25Z) - Self-supervised Pre-training with Masked Shape Prediction for 3D Scene
Understanding [106.0876425365599]
Masked Shape Prediction (MSP) is a new framework to conduct masked signal modeling in 3D scenes.
MSP uses the essential 3D semantic cue, i.e., geometric shape, as the prediction target for masked points.
arXiv Detail & Related papers (2023-05-08T20:09:19Z) - Contrastive Context-Aware Learning for 3D High-Fidelity Mask Face
Presentation Attack Detection [103.7264459186552]
Face presentation attack detection (PAD) is essential to secure face recognition systems.
Most existing 3D mask PAD benchmarks suffer from several drawbacks.
We introduce a largescale High-Fidelity Mask dataset to bridge the gap to real-world applications.
arXiv Detail & Related papers (2021-04-13T12:48:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.