Related papers: Efficient 3D Instance Mapping and Localization with Neural Fields

Efficient 3D Instance Mapping and Localization with Neural Fields

URL: http://arxiv.org/abs/2403.19797v2
Date: Mon, 1 Apr 2024 02:57:07 GMT
Title: Efficient 3D Instance Mapping and Localization with Neural Fields
Authors: George Tang, Krishna Murthy Jatavallabhula, Antonio Torralba,
Abstract summary: 3DIML is a novel framework that efficiently learns a label field to produce view-consistent instance segmentation masks. We evaluate 3DIML on sequences from the Replica and ScanNet datasets.
Score: 39.73128916618561
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We tackle the problem of learning an implicit scene representation for 3D instance segmentation from a sequence of posed RGB images. Towards this, we introduce 3DIML, a novel framework that efficiently learns a label field that may be rendered from novel viewpoints to produce view-consistent instance segmentation masks. 3DIML significantly improves upon training and inference runtimes of existing implicit scene representation based methods. Opposed to prior art that optimizes a neural field in a self-supervised manner, requiring complicated training procedures and loss function design, 3DIML leverages a two-phase process. The first phase, InstanceMap, takes as input 2D segmentation masks of the image sequence generated by a frontend instance segmentation model, and associates corresponding masks across images to 3D labels. These almost view-consistent pseudolabel masks are then used in the second phase, InstanceLift, to supervise the training of a neural label field, which interpolates regions missed by InstanceMap and resolves ambiguities. Additionally, we introduce InstanceLoc, which enables near realtime localization of instance masks given a trained label field and an off-the-shelf image segmentation model by fusing outputs from both. We evaluate 3DIML on sequences from the Replica and ScanNet datasets and demonstrate 3DIML's effectiveness under mild assumptions for the image sequences. We achieve a large practical speedup over existing implicit scene representation methods with comparable quality, showcasing its potential to facilitate faster and more effective 3D scene understanding.

Related papers

Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields. LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation. It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z)
Enforcing View-Consistency in Class-Agnostic 3D Segmentation Fields [46.711276257688326]
Radiance Fields have become a powerful tool for modeling 3D scenes from multiple images. Some methods work well using 2D semantic masks, but they generalize poorly to class-agnostic segmentations. More recent methods circumvent this issue by using contrastive learning to optimize a high-dimensional 3D feature field instead.
arXiv Detail & Related papers (2024-08-19T12:07:24Z)
PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction [23.798691661418253]
We propose a novel zero-shot panoptic reconstruction method from RGB-D images of scenes. We tackle both challenges by propagating partial labels with the aid of dense generalized features. Our method outperforms state-of-the-art methods on the indoor dataset ScanNet V2 and the outdoor dataset KITTI-360.
arXiv Detail & Related papers (2024-07-01T15:06:04Z)
Fast and Efficient: Mask Neural Fields for 3D Scene Segmentation [47.08813064337934]
This paper presents MaskField, which enables efficient 3D open-vocabulary segmentation with neural fields from a novel perspective. MaskField decomposes the distillation of mask and semantic features from foundation models by formulating a mask feature field and queries. Our experiments show that MaskField not only surpasses prior state-of-the-art methods but also achieves remarkably fast convergence.
arXiv Detail & Related papers (2024-07-01T12:07:26Z)
Mask-Attention-Free Transformer for 3D Instance Segmentation [68.29828726317723]
transformer-based methods have dominated 3D instance segmentation, where mask attention is commonly involved. We develop a series of position-aware designs to overcome the low-recall issue and perform cross-attention by imposing positional prior. Experiments show that our approach converges 4x faster than existing work, sets a new state of the art on ScanNetv2 3D instance segmentation benchmark, and also demonstrates superior performance across various datasets.
arXiv Detail & Related papers (2023-09-04T16:09:28Z)
Panoptic Lifting for 3D Scene Understanding with Neural Fields [32.59498558663363]
We propose a novel approach for learning panoptic 3D representations from images of in-the-wild scenes. Our method requires only machine-generated 2D panoptic segmentation masks inferred from a pre-trained network. Experimental results validate our approach on the challenging Hypersim, Replica, and ScanNet datasets.
arXiv Detail & Related papers (2022-12-19T19:15:36Z)
Mask3D: Mask Transformer for 3D Semantic Instance Segmentation [89.41640045953378]
We show that we can leverage generic Transformer building blocks to directly predict instance masks from 3D point clouds. Using Transformer decoders, the instance queries are learned by iteratively attending to point cloud features at multiple scales. Mask3D sets a new state-of-the-art on ScanNet test (+6.2 mAP), S3DIS 6-fold (+10.1 mAP),LS3D (+11.2 mAP) and ScanNet200 test (+12.4 mAP)
arXiv Detail & Related papers (2022-10-06T17:55:09Z)
Semantic Implicit Neural Scene Representations With Semi-Supervised Training [47.61092265963234]
We show that implicit neural scene representations can be leveraged to perform per-point semantic segmentation. Our method is simple, general, and only requires a few tens of labeled 2D segmentation masks. We explore two novel applications for this semantically aware implicit neural scene representation.
arXiv Detail & Related papers (2020-03-28T00:43:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.