PlanarNeRF: Online Learning of Planar Primitives with Neural Radiance
Fields
- URL: http://arxiv.org/abs/2401.00871v1
- Date: Sat, 30 Dec 2023 03:48:22 GMT
- Title: PlanarNeRF: Online Learning of Planar Primitives with Neural Radiance
Fields
- Authors: Zheng Chen, Qingan Yan, Huangying Zhan, Changjiang Cai, Xiangyu Xu,
Yuzhong Huang, Weihan Wang, Ziyue Feng, Lantao Liu, Yi Xu
- Abstract summary: PlanarNeRF is a novel framework capable of detecting dense 3D planes through online learning.
It enhances 3D plane detection with concurrent appearance and geometry knowledge.
A lightweight plane fitting module is proposed to estimate plane parameters.
- Score: 34.99249208739048
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Identifying spatially complete planar primitives from visual data is a
crucial task in computer vision. Prior methods are largely restricted to either
2D segment recovery or simplifying 3D structures, even with extensive plane
annotations. We present PlanarNeRF, a novel framework capable of detecting
dense 3D planes through online learning. Drawing upon the neural field
representation, PlanarNeRF brings three major contributions. First, it enhances
3D plane detection with concurrent appearance and geometry knowledge. Second, a
lightweight plane fitting module is proposed to estimate plane parameters.
Third, a novel global memory bank structure with an update mechanism is
introduced, ensuring consistent cross-frame correspondence. The flexible
architecture of PlanarNeRF allows it to function in both 2D-supervised and
self-supervised solutions, in each of which it can effectively learn from
sparse training signals, significantly improving training efficiency. Through
extensive experiments, we demonstrate the effectiveness of PlanarNeRF in
various scenarios and remarkable improvement over existing works.
Related papers
- LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving [52.83707400688378]
LargeAD is a versatile and scalable framework designed for large-scale 3D pretraining across diverse real-world driving datasets.
Our framework leverages VFMs to extract semantically rich superpixels from 2D images, which are aligned with LiDAR point clouds to generate high-quality contrastive samples.
Our approach delivers significant performance improvements over state-of-the-art methods in both linear probing and fine-tuning tasks for both LiDAR-based segmentation and object detection.
arXiv Detail & Related papers (2025-01-07T18:59:59Z) - PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes [32.00236197233923]
PlanarSplatting is an ultra-fast and accurate surface reconstruction approach for multiview indoor images.
PlanarSplatting reconstructs an indoor scene in 3 minutes while having significantly better geometric accuracy.
arXiv Detail & Related papers (2024-12-04T16:38:07Z) - A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We introduce a diffusion model for Gaussian Splats, SplatDiffusion, to enable generation of three-dimensional structures from single images.
Existing methods rely on deterministic, feed-forward predictions, which limit their ability to handle the inherent ambiguity of 3D inference from 2D data.
arXiv Detail & Related papers (2024-12-01T00:29:57Z) - MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction [37.481945507799594]
This paper presents a generalizable 3D plane detection and reconstruction framework named MonoPlane.
We first leverage large-scale pre-trained neural networks to obtain the depth and surface normals from a single image.
These monocular geometric cues are then incorporated into a proximity-guided RANSAC framework to sequentially fit each plane instance.
arXiv Detail & Related papers (2024-11-02T12:15:29Z) - UniPlane: Unified Plane Detection and Reconstruction from Posed Monocular Videos [12.328095228008893]
We present UniPlane, a novel method that unifies plane detection and reconstruction from posed monocular videos.
We build a Transformers-based deep neural network that jointly constructs a 3D feature volume for the environment.
Experiments on real-world datasets demonstrate that UniPlane outperforms state-of-the-art methods in both plane detection and reconstruction tasks.
arXiv Detail & Related papers (2024-07-04T03:02:27Z) - FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models [59.13757801286343]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data.
We introduce the FILP-3D framework with two novel components: the Redundant Feature Eliminator (RFE) for feature space misalignment and the Spatial Noise Compensator (SNC) for significant noise.
arXiv Detail & Related papers (2023-12-28T14:52:07Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - PlanarRecon: Real-time 3D Plane Detection and Reconstruction from Posed
Monocular Videos [32.286637700503995]
PlanarRecon is a framework for globally coherent detection and reconstruction of 3D planes from a posed monocular video.
A learning-based tracking and fusion module is designed to merge planes from previous fragments to form a coherent global plane reconstruction.
Experiments show that the proposed approach achieves state-of-the-art performances on the ScanNet dataset while being real-time.
arXiv Detail & Related papers (2022-06-15T17:59:16Z) - Active 3D Shape Reconstruction from Vision and Touch [66.08432412497443]
Humans build 3D understandings of the world through active object exploration, using jointly their senses of vision and touch.
In 3D shape reconstruction, most recent progress has relied on static datasets of limited sensory data such as RGB images, depth maps or haptic readings.
We introduce a system composed of: 1) a haptic simulator leveraging high spatial resolution vision-based tactile sensors for active touching of 3D objects; 2) a mesh-based 3D shape reconstruction model that relies on tactile or visuotactile priors to guide the shape exploration; and 3) a set of data-driven solutions with either tactile or visuo
arXiv Detail & Related papers (2021-07-20T15:56:52Z) - Dynamic Plane Convolutional Occupancy Networks [4.607145155913717]
We propose Dynamic Plane Convolutional Occupancy Networks to push further the quality of 3D surface reconstruction.
A fully-connected network learns to predict plane parameters that best describe the shapes of objects or scenes.
Our method shows superior performance in surface reconstruction from unoriented point clouds in ShapeNet as well as an indoor scene dataset.
arXiv Detail & Related papers (2020-11-11T14:24:52Z) - KAPLAN: A 3D Point Descriptor for Shape Completion [80.15764700137383]
KAPLAN is a 3D point descriptor that aggregates local shape information via a series of 2D convolutions.
In each of those planes, point properties like normals or point-to-plane distances are aggregated into a 2D grid and abstracted into a feature representation with an efficient 2D convolutional encoder.
Experiments on public datasets show that KAPLAN achieves state-of-the-art performance for 3D shape completion.
arXiv Detail & Related papers (2020-07-31T21:56:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.