PlanarRecon: Real-time 3D Plane Detection and Reconstruction from Posed
Monocular Videos
- URL: http://arxiv.org/abs/2206.07710v1
- Date: Wed, 15 Jun 2022 17:59:16 GMT
- Title: PlanarRecon: Real-time 3D Plane Detection and Reconstruction from Posed
Monocular Videos
- Authors: Yiming Xie, Matheus Gadelha, Fengting Yang, Xiaowei Zhou, Huaizu Jiang
- Abstract summary: PlanarRecon is a framework for globally coherent detection and reconstruction of 3D planes from a posed monocular video.
A learning-based tracking and fusion module is designed to merge planes from previous fragments to form a coherent global plane reconstruction.
Experiments show that the proposed approach achieves state-of-the-art performances on the ScanNet dataset while being real-time.
- Score: 32.286637700503995
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present PlanarRecon -- a novel framework for globally coherent detection
and reconstruction of 3D planes from a posed monocular video. Unlike previous
works that detect planes in 2D from a single image, PlanarRecon incrementally
detects planes in 3D for each video fragment, which consists of a set of key
frames, from a volumetric representation of the scene using neural networks. A
learning-based tracking and fusion module is designed to merge planes from
previous fragments to form a coherent global plane reconstruction. Such design
allows PlanarRecon to integrate observations from multiple views within each
fragment and temporal information across different ones, resulting in an
accurate and coherent reconstruction of the scene abstraction with
low-polygonal geometry. Experiments show that the proposed approach achieves
state-of-the-art performances on the ScanNet dataset while being real-time.
Related papers
- MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction [37.481945507799594]
This paper presents a generalizable 3D plane detection and reconstruction framework named MonoPlane.
We first leverage large-scale pre-trained neural networks to obtain the depth and surface normals from a single image.
These monocular geometric cues are then incorporated into a proximity-guided RANSAC framework to sequentially fit each plane instance.
arXiv Detail & Related papers (2024-11-02T12:15:29Z) - UniPlane: Unified Plane Detection and Reconstruction from Posed Monocular Videos [12.328095228008893]
We present UniPlane, a novel method that unifies plane detection and reconstruction from posed monocular videos.
We build a Transformers-based deep neural network that jointly constructs a 3D feature volume for the environment.
Experiments on real-world datasets demonstrate that UniPlane outperforms state-of-the-art methods in both plane detection and reconstruction tasks.
arXiv Detail & Related papers (2024-07-04T03:02:27Z) - Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction [84.94140661523956]
We propose a tri-perspective view (TPV) representation which accompanies BEV with two additional perpendicular planes.
We model each point in the 3D space by summing its projected features on the three planes.
Experiments show that our model trained with sparse supervision effectively predicts the semantic occupancy for all voxels.
arXiv Detail & Related papers (2023-02-15T17:58:10Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - PlaneMVS: 3D Plane Reconstruction from Multi-View Stereo [32.81496429134453]
We present a novel framework named PlaneMVS for 3D plane reconstruction from multiple input views with known camera poses.
In contrast, we reconstruct 3D planes with a multi-view-stereo (MVS) pipeline that takes advantage of multi-view geometry.
Our method even outperforms a set of SOTA learning-based MVS methods thanks to the learned plane priors.
arXiv Detail & Related papers (2022-03-22T22:35:46Z) - PlaneTR: Structure-Guided Transformers for 3D Plane Recovery [56.23402171871664]
PlaneTR simultaneously detects and reconstructs planes from a single image.
PlaneTR achieves a state-of-the-art performance on the ScanNet and NYUv2 datasets.
arXiv Detail & Related papers (2021-07-27T23:55:40Z) - SAIL-VOS 3D: A Synthetic Dataset and Baselines for Object Detection and
3D Mesh Reconstruction from Video Data [124.2624568006391]
We present SAIL-VOS 3D: a synthetic video dataset with frame-by-frame mesh annotations.
We also develop first baselines for reconstruction of 3D meshes from video data via temporal models.
arXiv Detail & Related papers (2021-05-18T15:42:37Z) - NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video [41.554961144321474]
We propose to reconstruct local surfaces represented as sparse TSDF volumes for each video fragment sequentially by a neural network.
A learning-based TSDF fusion module is used to guide the network to fuse features from previous fragments.
Experiments on ScanNet and 7-Scenes datasets show that our system outperforms state-of-the-art methods in terms of both accuracy and speed.
arXiv Detail & Related papers (2021-04-01T17:59:46Z) - Multi-Plane Program Induction with 3D Box Priors [110.6726150681556]
We present Box Program Induction (BPI), which infers a program-like scene representation from a single image.
BPI simultaneously models repeated structure on multiple 2D planes, the 3D position and orientation of the planes, and camera parameters.
It uses neural networks to infer visual cues such as vanishing points, wireframe lines to guide a search-based algorithm to find the program that best explains the image.
arXiv Detail & Related papers (2020-11-19T18:07:46Z) - KAPLAN: A 3D Point Descriptor for Shape Completion [80.15764700137383]
KAPLAN is a 3D point descriptor that aggregates local shape information via a series of 2D convolutions.
In each of those planes, point properties like normals or point-to-plane distances are aggregated into a 2D grid and abstracted into a feature representation with an efficient 2D convolutional encoder.
Experiments on public datasets show that KAPLAN achieves state-of-the-art performance for 3D shape completion.
arXiv Detail & Related papers (2020-07-31T21:56:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.