Related papers: Multi-task Planar Reconstruction with Feature Warping Guidance

Multi-task Planar Reconstruction with Feature Warping Guidance

URL: http://arxiv.org/abs/2311.14981v2
Date: Thu, 21 Dec 2023 16:45:12 GMT
Title: Multi-task Planar Reconstruction with Feature Warping Guidance
Authors: Luan Wei, Anna Hilsmann and Peter Eisert
Abstract summary: Piece-wise planar 3D reconstruction simultaneously segments plane instances and recovers their 3D plane parameters from an image. We introduce SOLOPlanes, a real-time planar reconstruction model based on a modified instance segmentation architecture. Our model simultaneously predicts semantics using single images at inference time, while achieving real-time predictions at 43 FPS.
Score: 3.95944314850151
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Piece-wise planar 3D reconstruction simultaneously segments plane instances and recovers their 3D plane parameters from an image, which is particularly useful for indoor or man-made environments. Efficient reconstruction of 3D planes coupled with semantic predictions offers advantages for a wide range of applications requiring scene understanding and concurrent spatial mapping. However, most existing planar reconstruction models either neglect semantic predictions or do not run efficiently enough for real-time applications. We introduce SOLOPlanes, a real-time planar reconstruction model based on a modified instance segmentation architecture which simultaneously predicts semantics for each plane instance, along with plane parameters and piece-wise plane instance masks. We achieve an improvement in instance mask segmentation by including multi-view guidance for plane predictions in the training process. This cross-task improvement, training for plane prediction but improving the mask segmentation, is due to the nature of feature sharing in multi-task learning. Our model simultaneously predicts semantics using single images at inference time, while achieving real-time predictions at 43 FPS.

Related papers

PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes [32.00236197233923]
PlanarSplatting is an ultra-fast and accurate surface reconstruction approach for multiview indoor images. PlanarSplatting reconstructs an indoor scene in 3 minutes while having significantly better geometric accuracy.
arXiv Detail & Related papers (2024-12-04T16:38:07Z)
MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction [37.481945507799594]
This paper presents a generalizable 3D plane detection and reconstruction framework named MonoPlane. We first leverage large-scale pre-trained neural networks to obtain the depth and surface normals from a single image. These monocular geometric cues are then incorporated into a proximity-guided RANSAC framework to sequentially fit each plane instance.
arXiv Detail & Related papers (2024-11-02T12:15:29Z)
Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields. LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation. It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z)
AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings [26.845588648999417]
We tackle the problem of estimating the planar surfaces in a 3D scene from posed images. We propose a method that predicts multi-view consistent plane embeddings that complement geometry when clustering points into planes. We show through extensive evaluation on the ScanNetV2 dataset that our new method outperforms existing approaches.
arXiv Detail & Related papers (2024-06-13T09:49:31Z)
360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results. We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics. We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations. Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z)
OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision. We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range. For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z)
PlaneRecTR++: Unified Query Learning for Joint 3D Planar Reconstruction and Pose Estimation [10.982464344805194]
PlaneRecTR++ is a Transformer-based architecture that unifies all sub-tasks related to multi-view reconstruction and pose estimation. Our proposed unified learning achieves mutual benefits across sub-tasks, obtaining a new state-of-the-art performance on public ScanNetv1, ScanNetv2, NYUv2-Plane, and MatterPort3D datasets.
arXiv Detail & Related papers (2023-07-25T18:28:19Z)
Self-supervised Pre-training with Masked Shape Prediction for 3D Scene Understanding [106.0876425365599]
Masked Shape Prediction (MSP) is a new framework to conduct masked signal modeling in 3D scenes. MSP uses the essential 3D semantic cue, i.e., geometric shape, as the prediction target for masked points.
arXiv Detail & Related papers (2023-05-08T20:09:19Z)
Occupancy Planes for Single-view RGB-D Human Reconstruction [120.5818162569105]
Single-view RGB-D human reconstruction with implicit functions is often formulated as per-point classification. We propose the occupancy planes (OPlanes) representation, which enables to formulate single-view RGB-D human reconstruction as occupancy prediction on planes which slice through the camera's view frustum.
arXiv Detail & Related papers (2022-08-04T17:59:56Z)
PlanarRecon: Real-time 3D Plane Detection and Reconstruction from Posed Monocular Videos [32.286637700503995]
PlanarRecon is a framework for globally coherent detection and reconstruction of 3D planes from a posed monocular video. A learning-based tracking and fusion module is designed to merge planes from previous fragments to form a coherent global plane reconstruction. Experiments show that the proposed approach achieves state-of-the-art performances on the ScanNet dataset while being real-time.
arXiv Detail & Related papers (2022-06-15T17:59:16Z)
Neural 3D Scene Reconstruction with the Manhattan-world Assumption [58.90559966227361]
This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view images. Planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods. The proposed method outperforms previous methods by a large margin on 3D reconstruction quality.
arXiv Detail & Related papers (2022-05-05T17:59:55Z)
PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for Piece-Wise Plane Detection and Reconstruction from a Single RGB Image [11.215334675788952]
Piece-wise 3D planar reconstruction provides holistic scene understanding of man-made environments, especially for indoor scenarios. Most recent approaches focused on improving the segmentation and reconstruction results by introducing advanced network architectures. We start from enforcing cross-task consistency for our multi-task convolutional neural network, PlaneRecNet. We introduce several novel loss functions (geometric constraint) that jointly improve the accuracy of piece-wise planar segmentation and depth estimation.
arXiv Detail & Related papers (2021-10-21T15:54:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.