Related papers: PlaneTR: Structure-Guided Transformers for 3D Plane Recovery

PlaneTR: Structure-Guided Transformers for 3D Plane Recovery

URL: http://arxiv.org/abs/2107.13108v1
Date: Tue, 27 Jul 2021 23:55:40 GMT
Title: PlaneTR: Structure-Guided Transformers for 3D Plane Recovery
Authors: Bin Tan and Nan Xue and Song Bai and Tianfu Wu and Gui-Song Xia
Abstract summary: PlaneTR simultaneously detects and reconstructs planes from a single image. PlaneTR achieves a state-of-the-art performance on the ScanNet and NYUv2 datasets.
Score: 56.23402171871664
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper presents a neural network built upon Transformers, namely PlaneTR, to simultaneously detect and reconstruct planes from a single image. Different from previous methods, PlaneTR jointly leverages the context information and the geometric structures in a sequence-to-sequence way to holistically detect plane instances in one forward pass. Specifically, we represent the geometric structures as line segments and conduct the network with three main components: (i) context and line segments encoders, (ii) a structure-guided plane decoder, (iii) a pixel-wise plane embedding decoder. Given an image and its detected line segments, PlaneTR generates the context and line segment sequences via two specially designed encoders and then feeds them into a Transformers-based decoder to directly predict a sequence of plane instances by simultaneously considering the context and global structure cues. Finally, the pixel-wise embeddings are computed to assign each pixel to one predicted plane instance which is nearest to it in embedding space. Comprehensive experiments demonstrate that PlaneTR achieves a state-of-the-art performance on the ScanNet and NYUv2 datasets.

Related papers

MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction [37.481945507799594]
This paper presents a generalizable 3D plane detection and reconstruction framework named MonoPlane. We first leverage large-scale pre-trained neural networks to obtain the depth and surface normals from a single image. These monocular geometric cues are then incorporated into a proximity-guided RANSAC framework to sequentially fit each plane instance.
arXiv Detail & Related papers (2024-11-02T12:15:29Z)
Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries. We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z)
UniPlane: Unified Plane Detection and Reconstruction from Posed Monocular Videos [12.328095228008893]
We present UniPlane, a novel method that unifies plane detection and reconstruction from posed monocular videos. We build a Transformers-based deep neural network that jointly constructs a 3D feature volume for the environment. Experiments on real-world datasets demonstrate that UniPlane outperforms state-of-the-art methods in both plane detection and reconstruction tasks.
arXiv Detail & Related papers (2024-07-04T03:02:27Z)
Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology. Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z)
PlanarRecon: Real-time 3D Plane Detection and Reconstruction from Posed Monocular Videos [32.286637700503995]
PlanarRecon is a framework for globally coherent detection and reconstruction of 3D planes from a posed monocular video. A learning-based tracking and fusion module is designed to merge planes from previous fragments to form a coherent global plane reconstruction. Experiments show that the proposed approach achieves state-of-the-art performances on the ScanNet dataset while being real-time.
arXiv Detail & Related papers (2022-06-15T17:59:16Z)
PlaneMVS: 3D Plane Reconstruction from Multi-View Stereo [32.81496429134453]
We present a novel framework named PlaneMVS for 3D plane reconstruction from multiple input views with known camera poses. In contrast, we reconstruct 3D planes with a multi-view-stereo (MVS) pipeline that takes advantage of multi-view geometry. Our method even outperforms a set of SOTA learning-based MVS methods thanks to the learned plane priors.
arXiv Detail & Related papers (2022-03-22T22:35:46Z)
Multi-Plane Program Induction with 3D Box Priors [110.6726150681556]
We present Box Program Induction (BPI), which infers a program-like scene representation from a single image. BPI simultaneously models repeated structure on multiple 2D planes, the 3D position and orientation of the planes, and camera parameters. It uses neural networks to infer visual cues such as vanishing points, wireframe lines to guide a search-based algorithm to find the program that best explains the image.
arXiv Detail & Related papers (2020-11-19T18:07:46Z)
Dynamic Plane Convolutional Occupancy Networks [4.607145155913717]
We propose Dynamic Plane Convolutional Occupancy Networks to push further the quality of 3D surface reconstruction. A fully-connected network learns to predict plane parameters that best describe the shapes of objects or scenes. Our method shows superior performance in surface reconstruction from unoriented point clouds in ShapeNet as well as an indoor scene dataset.
arXiv Detail & Related papers (2020-11-11T14:24:52Z)
KAPLAN: A 3D Point Descriptor for Shape Completion [80.15764700137383]
KAPLAN is a 3D point descriptor that aggregates local shape information via a series of 2D convolutions. In each of those planes, point properties like normals or point-to-plane distances are aggregated into a 2D grid and abstracted into a feature representation with an efficient 2D convolutional encoder. Experiments on public datasets show that KAPLAN achieves state-of-the-art performance for 3D shape completion.
arXiv Detail & Related papers (2020-07-31T21:56:08Z)
From Planes to Corners: Multi-Purpose Primitive Detection in Unorganized 3D Point Clouds [59.98665358527686]
We propose a new method for segmentation-free joint estimation of orthogonal planes. Such unified scene exploration allows for multitudes of applications such as semantic plane detection or local and global scan alignment. Our experiments demonstrate the validity of our approach in numerous scenarios from wall detection to 6D tracking.
arXiv Detail & Related papers (2020-01-21T06:51:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.