Related papers: PLANA3R: Zero-shot Metric Planar 3D Reconstruction via Feed-Forward Planar Splatting

PLANA3R: Zero-shot Metric Planar 3D Reconstruction via Feed-Forward Planar Splatting

URL: http://arxiv.org/abs/2510.18714v1
Date: Tue, 21 Oct 2025 15:15:33 GMT
Title: PLANA3R: Zero-shot Metric Planar 3D Reconstruction via Feed-Forward Planar Splatting
Authors: Changkun Liu, Bin Tan, Zeran Ke, Shangzhan Zhang, Jiachen Liu, Ming Qian, Nan Xue, Yujun Shen, Tristan Braud,
Abstract summary: We introduce PLANA3R, a pose-free framework for metric Planar 3D Reconstruction from unposed two-view images.<n>Unlike prior feedforward methods that require 3D plane annotations during training, PLANA3R learns planar 3D structures without explicit plane supervision.<n>We validate PLANA3R on multiple indoor-scene datasets with metric supervision and demonstrate strong generalization to out-of-domain indoor environments.
Score: 56.188624157291024
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper addresses metric 3D reconstruction of indoor scenes by exploiting their inherent geometric regularities with compact representations. Using planar 3D primitives - a well-suited representation for man-made environments - we introduce PLANA3R, a pose-free framework for metric Planar 3D Reconstruction from unposed two-view images. Our approach employs Vision Transformers to extract a set of sparse planar primitives, estimate relative camera poses, and supervise geometry learning via planar splatting, where gradients are propagated through high-resolution rendered depth and normal maps of primitives. Unlike prior feedforward methods that require 3D plane annotations during training, PLANA3R learns planar 3D structures without explicit plane supervision, enabling scalable training on large-scale stereo datasets using only depth and normal annotations. We validate PLANA3R on multiple indoor-scene datasets with metric supervision and demonstrate strong generalization to out-of-domain indoor environments across diverse tasks under metric evaluation protocols, including 3D surface reconstruction, depth estimation, and relative pose estimation. Furthermore, by formulating with planar 3D representation, our method emerges with the ability for accurate plane segmentation. The project page is available at https://lck666666.github.io/plana3r

Related papers

Towards In-the-wild 3D Plane Reconstruction from a Single Image [16.857296782216206]
3D plane reconstruction from a single image is a crucial yet challenging topic in 3D computer vision.<n>Previous state-of-the-art methods have focused on training their system on a single dataset from either indoor or outdoor domain.<n>We introduce a novel framework dubbed ZeroPlane, a Transformer-based model targeting zero-shot 3D plane detection and reconstruction from a single image.
arXiv Detail & Related papers (2025-06-03T06:14:05Z)
PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes [32.00236197233923]
PlanarSplatting is an ultra-fast and accurate surface reconstruction approach for multiview indoor images.<n>PlanarSplatting reconstructs an indoor scene in 3 minutes while having significantly better geometric accuracy.
arXiv Detail & Related papers (2024-12-04T16:38:07Z)
MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction [37.481945507799594]
This paper presents a generalizable 3D plane detection and reconstruction framework named MonoPlane. We first leverage large-scale pre-trained neural networks to obtain the depth and surface normals from a single image. These monocular geometric cues are then incorporated into a proximity-guided RANSAC framework to sequentially fit each plane instance.
arXiv Detail & Related papers (2024-11-02T12:15:29Z)
GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.<n>Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z)
OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision. We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range. For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z)
Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology. Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z)
PlanarRecon: Real-time 3D Plane Detection and Reconstruction from Posed Monocular Videos [32.286637700503995]
PlanarRecon is a framework for globally coherent detection and reconstruction of 3D planes from a posed monocular video. A learning-based tracking and fusion module is designed to merge planes from previous fragments to form a coherent global plane reconstruction. Experiments show that the proposed approach achieves state-of-the-art performances on the ScanNet dataset while being real-time.
arXiv Detail & Related papers (2022-06-15T17:59:16Z)
Neural 3D Scene Reconstruction with the Manhattan-world Assumption [58.90559966227361]
This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view images. Planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods. The proposed method outperforms previous methods by a large margin on 3D reconstruction quality.
arXiv Detail & Related papers (2022-05-05T17:59:55Z)
Monocular Road Planar Parallax Estimation [25.36368935789501]
Estimating the 3D structure of the drivable surface and surrounding environment is a crucial task for assisted and autonomous driving. We propose Road Planar Parallax Attention Network (RPANet), a new deep neural network for 3D sensing from monocular image sequences. RPANet takes a pair of images aligned by the homography of the road plane as input and outputs a $gamma$ map for 3D reconstruction.
arXiv Detail & Related papers (2021-11-22T10:03:41Z)
KAPLAN: A 3D Point Descriptor for Shape Completion [80.15764700137383]
KAPLAN is a 3D point descriptor that aggregates local shape information via a series of 2D convolutions. In each of those planes, point properties like normals or point-to-plane distances are aggregated into a 2D grid and abstracted into a feature representation with an efficient 2D convolutional encoder. Experiments on public datasets show that KAPLAN achieves state-of-the-art performance for 3D shape completion.
arXiv Detail & Related papers (2020-07-31T21:56:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.