NeRF-SOS: Any-View Self-supervised Object Segmentation from Complex
Real-World Scenes
- URL: http://arxiv.org/abs/2209.08776v3
- Date: Thu, 22 Sep 2022 05:40:50 GMT
- Title: NeRF-SOS: Any-View Self-supervised Object Segmentation from Complex
Real-World Scenes
- Authors: Zhiwen Fan, Peihao Wang, Yifan Jiang, Xinyu Gong, Dejia Xu, Zhangyang
Wang
- Abstract summary: This paper carries out the exploration of self-supervised learning for object segmentation using NeRF for complex real-world scenes.
Our framework, called NeRF with Self-supervised Object NeRF-SOS, encourages NeRF models to distill compact geometry-aware segmentation clusters.
It consistently surpasses other 2D-based self-supervised baselines and predicts finer semantics masks than existing supervised counterparts.
- Score: 80.59831861186227
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural volumetric representations have shown the potential that Multi-layer
Perceptrons (MLPs) can be optimized with multi-view calibrated images to
represent scene geometry and appearance, without explicit 3D supervision.
Object segmentation can enrich many downstream applications based on the
learned radiance field. However, introducing hand-crafted segmentation to
define regions of interest in a complex real-world scene is non-trivial and
expensive as it acquires per view annotation. This paper carries out the
exploration of self-supervised learning for object segmentation using NeRF for
complex real-world scenes. Our framework, called NeRF with Self-supervised
Object Segmentation NeRF-SOS, couples object segmentation and neural radiance
field to segment objects in any view within a scene. By proposing a novel
collaborative contrastive loss in both appearance and geometry levels, NeRF-SOS
encourages NeRF models to distill compact geometry-aware segmentation clusters
from their density fields and the self-supervised pre-trained 2D visual
features. The self-supervised object segmentation framework can be applied to
various NeRF models that both lead to photo-realistic rendering results and
convincing segmentation maps for both indoor and outdoor scenarios. Extensive
results on the LLFF, Tank & Temple, and BlendedMVS datasets validate the
effectiveness of NeRF-SOS. It consistently surpasses other 2D-based
self-supervised baselines and predicts finer semantics masks than existing
supervised counterparts. Code is available at:
https://github.com/VITA-Group/NeRF-SOS.
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - SANeRF-HQ: Segment Anything for NeRF in High Quality [61.77762568224097]
We introduce the Segment Anything for NeRF in High Quality (SANeRF-HQ) to achieve high-quality 3D segmentation of any target object in a given scene.
We employ density field and RGB similarity to enhance the accuracy of segmentation boundary during the aggregation.
arXiv Detail & Related papers (2023-12-03T23:09:38Z) - Obj-NeRF: Extract Object NeRFs from Multi-view Images [7.669778218573394]
We propose -NeRF, a comprehensive pipeline that recovers the 3D geometry of a specific object from multi-view images using a single prompt.
We also apply -NeRF to various applications, including object removal, rotation, replacement, and recoloring.
arXiv Detail & Related papers (2023-11-26T13:15:37Z) - Interactive Segment Anything NeRF with Feature Imitation [20.972098365110426]
We propose to imitate the backbone feature of off-the-shelf perception models to achieve zero-shot semantic segmentation with NeRF.
Our framework reformulates the segmentation process by directly rendering semantic features and only applying the decoder from perception models.
Furthermore, we can project the learned semantics onto extracted mesh surfaces for real-time interaction.
arXiv Detail & Related papers (2023-05-25T16:44:51Z) - SegNeRF: 3D Part Segmentation with Neural Radiance Fields [63.12841224024818]
SegNeRF is a neural field representation that integrates a semantic field along with the usual radiance field.
SegNeRF is capable of simultaneously predicting geometry, appearance, and semantic information from posed images, even for unseen objects.
SegNeRF is able to generate an explicit 3D model from a single image of an object taken in the wild, with its corresponding part segmentation.
arXiv Detail & Related papers (2022-11-21T07:16:03Z) - Unsupervised Multi-View Object Segmentation Using Radiance Field
Propagation [55.9577535403381]
We present a novel approach to segmenting objects in 3D during reconstruction given only unlabeled multi-view images of a scene.
The core of our method is a novel propagation strategy for individual objects' radiance fields with a bidirectional photometric loss.
To the best of our knowledge, RFP is the first unsupervised approach for tackling 3D scene object segmentation for neural radiance field (NeRF)
arXiv Detail & Related papers (2022-10-02T11:14:23Z) - CLONeR: Camera-Lidar Fusion for Occupancy Grid-aided Neural
Representations [77.90883737693325]
This paper proposes CLONeR, which significantly improves upon NeRF by allowing it to model large outdoor driving scenes observed from sparse input sensor views.
This is achieved by decoupling occupancy and color learning within the NeRF framework into separate Multi-Layer Perceptrons (MLPs) trained using LiDAR and camera data, respectively.
In addition, this paper proposes a novel method to build differentiable 3D Occupancy Grid Maps (OGM) alongside the NeRF model, and leverage this occupancy grid for improved sampling of points along a ray for rendering in metric space.
arXiv Detail & Related papers (2022-09-02T17:44:50Z) - Decomposing 3D Scenes into Objects via Unsupervised Volume Segmentation [26.868351498722884]
We present ObSuRF, a method which turns a single image of a scene into a 3D model represented as a set of Neural Radiance Fields (NeRFs)
We make learning more computationally efficient by deriving a novel loss, which allows training NeRFs on RGB-D inputs without explicit ray marching.
arXiv Detail & Related papers (2021-04-02T16:59:29Z) - Weakly Supervised Semantic Segmentation in 3D Graph-Structured Point
Clouds of Wild Scenes [36.07733308424772]
The deficiency of 3D segmentation labels is one of the main obstacles to effective point cloud segmentation.
We propose a novel deep graph convolutional network-based framework for large-scale semantic scene segmentation in point clouds with sole 2D supervision.
arXiv Detail & Related papers (2020-04-26T23:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.