Multi-task Planar Reconstruction with Feature Warping Guidance
- URL: http://arxiv.org/abs/2311.14981v2
- Date: Thu, 21 Dec 2023 16:45:12 GMT
- Title: Multi-task Planar Reconstruction with Feature Warping Guidance
- Authors: Luan Wei, Anna Hilsmann and Peter Eisert
- Abstract summary: Piece-wise planar 3D reconstruction simultaneously segments plane instances and recovers their 3D plane parameters from an image.
We introduce SOLOPlanes, a real-time planar reconstruction model based on a modified instance segmentation architecture.
Our model simultaneously predicts semantics using single images at inference time, while achieving real-time predictions at 43 FPS.
- Score: 3.95944314850151
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Piece-wise planar 3D reconstruction simultaneously segments plane instances
and recovers their 3D plane parameters from an image, which is particularly
useful for indoor or man-made environments. Efficient reconstruction of 3D
planes coupled with semantic predictions offers advantages for a wide range of
applications requiring scene understanding and concurrent spatial mapping.
However, most existing planar reconstruction models either neglect semantic
predictions or do not run efficiently enough for real-time applications. We
introduce SOLOPlanes, a real-time planar reconstruction model based on a
modified instance segmentation architecture which simultaneously predicts
semantics for each plane instance, along with plane parameters and piece-wise
plane instance masks. We achieve an improvement in instance mask segmentation
by including multi-view guidance for plane predictions in the training process.
This cross-task improvement, training for plane prediction but improving the
mask segmentation, is due to the nature of feature sharing in multi-task
learning. Our model simultaneously predicts semantics using single images at
inference time, while achieving real-time predictions at 43 FPS.
Related papers
- AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings [26.845588648999417]
We tackle the problem of estimating the planar surfaces in a 3D scene from posed images.
We propose a method that predicts multi-view consistent plane embeddings that complement geometry when clustering points into planes.
We show through extensive evaluation on the ScanNetV2 dataset that our new method outperforms existing approaches.
arXiv Detail & Related papers (2024-06-13T09:49:31Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and
Multi-view Geometric Consistency Perception [60.23832277827669]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence.
We present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
We also introduce an optimization strategy for decision-level layout analysis and a 1D cost volume construction method for feature-level multi-view aggregation.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision.
We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range.
For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z) - Self-supervised Pre-training with Masked Shape Prediction for 3D Scene
Understanding [106.0876425365599]
Masked Shape Prediction (MSP) is a new framework to conduct masked signal modeling in 3D scenes.
MSP uses the essential 3D semantic cue, i.e., geometric shape, as the prediction target for masked points.
arXiv Detail & Related papers (2023-05-08T20:09:19Z) - Occupancy Planes for Single-view RGB-D Human Reconstruction [120.5818162569105]
Single-view RGB-D human reconstruction with implicit functions is often formulated as per-point classification.
We propose the occupancy planes (OPlanes) representation, which enables to formulate single-view RGB-D human reconstruction as occupancy prediction on planes which slice through the camera's view frustum.
arXiv Detail & Related papers (2022-08-04T17:59:56Z) - PlanarRecon: Real-time 3D Plane Detection and Reconstruction from Posed
Monocular Videos [32.286637700503995]
PlanarRecon is a framework for globally coherent detection and reconstruction of 3D planes from a posed monocular video.
A learning-based tracking and fusion module is designed to merge planes from previous fragments to form a coherent global plane reconstruction.
Experiments show that the proposed approach achieves state-of-the-art performances on the ScanNet dataset while being real-time.
arXiv Detail & Related papers (2022-06-15T17:59:16Z) - Neural 3D Scene Reconstruction with the Manhattan-world Assumption [58.90559966227361]
This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view images.
Planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods.
The proposed method outperforms previous methods by a large margin on 3D reconstruction quality.
arXiv Detail & Related papers (2022-05-05T17:59:55Z) - PlaneMVS: 3D Plane Reconstruction from Multi-View Stereo [32.81496429134453]
We present a novel framework named PlaneMVS for 3D plane reconstruction from multiple input views with known camera poses.
In contrast, we reconstruct 3D planes with a multi-view-stereo (MVS) pipeline that takes advantage of multi-view geometry.
Our method even outperforms a set of SOTA learning-based MVS methods thanks to the learned plane priors.
arXiv Detail & Related papers (2022-03-22T22:35:46Z) - PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for
Piece-Wise Plane Detection and Reconstruction from a Single RGB Image [11.215334675788952]
Piece-wise 3D planar reconstruction provides holistic scene understanding of man-made environments, especially for indoor scenarios.
Most recent approaches focused on improving the segmentation and reconstruction results by introducing advanced network architectures.
We start from enforcing cross-task consistency for our multi-task convolutional neural network, PlaneRecNet.
We introduce several novel loss functions (geometric constraint) that jointly improve the accuracy of piece-wise planar segmentation and depth estimation.
arXiv Detail & Related papers (2021-10-21T15:54:03Z) - Dynamic Plane Convolutional Occupancy Networks [4.607145155913717]
We propose Dynamic Plane Convolutional Occupancy Networks to push further the quality of 3D surface reconstruction.
A fully-connected network learns to predict plane parameters that best describe the shapes of objects or scenes.
Our method shows superior performance in surface reconstruction from unoriented point clouds in ShapeNet as well as an indoor scene dataset.
arXiv Detail & Related papers (2020-11-11T14:24:52Z) - SCFusion: Real-time Incremental Scene Reconstruction with Semantic
Completion [86.77318031029404]
We propose a framework that performs scene reconstruction and semantic scene completion jointly in an incremental and real-time manner.
Our framework relies on a novel neural architecture designed to process occupancy maps and leverages voxel states to accurately and efficiently fuse semantic completion with the 3D global model.
arXiv Detail & Related papers (2020-10-26T15:31:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.