UniPlane: Unified Plane Detection and Reconstruction from Posed Monocular Videos
- URL: http://arxiv.org/abs/2407.03594v1
- Date: Thu, 4 Jul 2024 03:02:27 GMT
- Title: UniPlane: Unified Plane Detection and Reconstruction from Posed Monocular Videos
- Authors: Yuzhong Huang, Chen Liu, Ji Hou, Ke Huo, Shiyu Dong, Fred Morstatter,
- Abstract summary: We present UniPlane, a novel method that unifies plane detection and reconstruction from posed monocular videos.
We build a Transformers-based deep neural network that jointly constructs a 3D feature volume for the environment.
Experiments on real-world datasets demonstrate that UniPlane outperforms state-of-the-art methods in both plane detection and reconstruction tasks.
- Score: 12.328095228008893
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We present UniPlane, a novel method that unifies plane detection and reconstruction from posed monocular videos. Unlike existing methods that detect planes from local observations and associate them across the video for the final reconstruction, UniPlane unifies both the detection and the reconstruction tasks in a single network, which allows us to directly optimize final reconstruction quality and fully leverage temporal information. Specifically, we build a Transformers-based deep neural network that jointly constructs a 3D feature volume for the environment and estimates a set of per-plane embeddings as queries. UniPlane directly reconstructs the 3D planes by taking dot products between voxel embeddings and the plane embeddings followed by binary thresholding. Extensive experiments on real-world datasets demonstrate that UniPlane outperforms state-of-the-art methods in both plane detection and reconstruction tasks, achieving +4.6 in F-score in geometry as well as consistent improvements in other geometry and segmentation metrics.
Related papers
- MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction [37.481945507799594]
This paper presents a generalizable 3D plane detection and reconstruction framework named MonoPlane.
We first leverage large-scale pre-trained neural networks to obtain the depth and surface normals from a single image.
These monocular geometric cues are then incorporated into a proximity-guided RANSAC framework to sequentially fit each plane instance.
arXiv Detail & Related papers (2024-11-02T12:15:29Z) - Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - Learning Reconstructability for Drone Aerial Path Planning [51.736344549907265]
We introduce the first learning-based reconstructability predictor to improve view and path planning for large-scale 3D urban scene acquisition using unmanned drones.
In contrast to previous approaches, our method learns a model that explicitly predicts how well a 3D urban scene will be reconstructed from a set of viewpoints.
arXiv Detail & Related papers (2022-09-21T08:10:26Z) - PlanarRecon: Real-time 3D Plane Detection and Reconstruction from Posed
Monocular Videos [32.286637700503995]
PlanarRecon is a framework for globally coherent detection and reconstruction of 3D planes from a posed monocular video.
A learning-based tracking and fusion module is designed to merge planes from previous fragments to form a coherent global plane reconstruction.
Experiments show that the proposed approach achieves state-of-the-art performances on the ScanNet dataset while being real-time.
arXiv Detail & Related papers (2022-06-15T17:59:16Z) - PlaneMVS: 3D Plane Reconstruction from Multi-View Stereo [32.81496429134453]
We present a novel framework named PlaneMVS for 3D plane reconstruction from multiple input views with known camera poses.
In contrast, we reconstruct 3D planes with a multi-view-stereo (MVS) pipeline that takes advantage of multi-view geometry.
Our method even outperforms a set of SOTA learning-based MVS methods thanks to the learned plane priors.
arXiv Detail & Related papers (2022-03-22T22:35:46Z) - PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for
Piece-Wise Plane Detection and Reconstruction from a Single RGB Image [11.215334675788952]
Piece-wise 3D planar reconstruction provides holistic scene understanding of man-made environments, especially for indoor scenarios.
Most recent approaches focused on improving the segmentation and reconstruction results by introducing advanced network architectures.
We start from enforcing cross-task consistency for our multi-task convolutional neural network, PlaneRecNet.
We introduce several novel loss functions (geometric constraint) that jointly improve the accuracy of piece-wise planar segmentation and depth estimation.
arXiv Detail & Related papers (2021-10-21T15:54:03Z) - PlaneTR: Structure-Guided Transformers for 3D Plane Recovery [56.23402171871664]
PlaneTR simultaneously detects and reconstructs planes from a single image.
PlaneTR achieves a state-of-the-art performance on the ScanNet and NYUv2 datasets.
arXiv Detail & Related papers (2021-07-27T23:55:40Z) - KAPLAN: A 3D Point Descriptor for Shape Completion [80.15764700137383]
KAPLAN is a 3D point descriptor that aggregates local shape information via a series of 2D convolutions.
In each of those planes, point properties like normals or point-to-plane distances are aggregated into a 2D grid and abstracted into a feature representation with an efficient 2D convolutional encoder.
Experiments on public datasets show that KAPLAN achieves state-of-the-art performance for 3D shape completion.
arXiv Detail & Related papers (2020-07-31T21:56:08Z) - OmniSLAM: Omnidirectional Localization and Dense Mapping for
Wide-baseline Multi-camera Systems [88.41004332322788]
We present an omnidirectional localization and dense mapping system for a wide-baseline multiview stereo setup with ultra-wide field-of-view (FOV) fisheye cameras.
For more practical and accurate reconstruction, we first introduce improved and light-weighted deep neural networks for the omnidirectional depth estimation.
We integrate our omnidirectional depth estimates into the visual odometry (VO) and add a loop closing module for global consistency.
arXiv Detail & Related papers (2020-03-18T05:52:10Z) - From Planes to Corners: Multi-Purpose Primitive Detection in Unorganized
3D Point Clouds [59.98665358527686]
We propose a new method for segmentation-free joint estimation of orthogonal planes.
Such unified scene exploration allows for multitudes of applications such as semantic plane detection or local and global scan alignment.
Our experiments demonstrate the validity of our approach in numerous scenarios from wall detection to 6D tracking.
arXiv Detail & Related papers (2020-01-21T06:51:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.