PlaneRecTR: Unified Query Learning for 3D Plane Recovery from a Single
View
- URL: http://arxiv.org/abs/2307.13756v2
- Date: Thu, 17 Aug 2023 14:56:24 GMT
- Title: PlaneRecTR: Unified Query Learning for 3D Plane Recovery from a Single
View
- Authors: Jingjia Shi, Shuaifeng Zhi, Kai Xu
- Abstract summary: PlaneRecTR is a Transformer-based architecture that unifies all subtasks related to single-view plane recovery with a single compact model.
Our proposed unified learning achieves mutual benefits across subtasks, obtaining a new state-of-the-art performance on public ScanNet and NYUv2-Plane datasets.
- Score: 12.343189317320004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D plane recovery from a single image can usually be divided into several
subtasks of plane detection, segmentation, parameter estimation and possibly
depth estimation. Previous works tend to solve this task by either extending
the RCNN-based segmentation network or the dense pixel embedding-based
clustering framework. However, none of them tried to integrate above related
subtasks into a unified framework but treat them separately and sequentially,
which we suspect is potentially a main source of performance limitation for
existing approaches. Motivated by this finding and the success of query-based
learning in enriching reasoning among semantic entities, in this paper, we
propose PlaneRecTR, a Transformer-based architecture, which for the first time
unifies all subtasks related to single-view plane recovery with a single
compact model. Extensive quantitative and qualitative experiments demonstrate
that our proposed unified learning achieves mutual benefits across subtasks,
obtaining a new state-of-the-art performance on public ScanNet and NYUv2-Plane
datasets. Codes are available at https://github.com/SJingjia/PlaneRecTR.
Related papers
- RoIPoly: Vectorized Building Outline Extraction Using Vertex and Logit Embeddings [5.093758132026397]
We propose a novel query-based approach for extracting building outlines from aerial or satellite imagery.
We formulate each polygon as a query and constrain the query attention on the most relevant regions of a potential building.
We evaluate our method on the vectorized building outline extraction dataset CrowdAI and the 2D floorplan reconstruction dataset Structured3D.
arXiv Detail & Related papers (2024-07-20T16:12:51Z) - UniPlane: Unified Plane Detection and Reconstruction from Posed Monocular Videos [12.328095228008893]
We present UniPlane, a novel method that unifies plane detection and reconstruction from posed monocular videos.
We build a Transformers-based deep neural network that jointly constructs a 3D feature volume for the environment.
Experiments on real-world datasets demonstrate that UniPlane outperforms state-of-the-art methods in both plane detection and reconstruction tasks.
arXiv Detail & Related papers (2024-07-04T03:02:27Z) - AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings [26.845588648999417]
We tackle the problem of estimating the planar surfaces in a 3D scene from posed images.
We propose a method that predicts multi-view consistent plane embeddings that complement geometry when clustering points into planes.
We show through extensive evaluation on the ScanNetV2 dataset that our new method outperforms existing approaches.
arXiv Detail & Related papers (2024-06-13T09:49:31Z) - Split-and-Fit: Learning B-Reps via Structure-Aware Voronoi Partitioning [50.684254969269546]
We introduce a novel method for acquiring boundary representations (B-Reps) of 3D CAD models.
We apply a spatial partitioning to derive a single primitive within each partition.
We show that our network, coined NVD-Net for neural Voronoi diagrams, can effectively learn Voronoi partitions for CAD models from training data.
arXiv Detail & Related papers (2024-06-07T21:07:49Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - A Fusion of Variational Distribution Priors and Saliency Map Replay for
Continual 3D Reconstruction [1.3812010983144802]
Single-image 3D reconstruction is a research challenge focused on predicting 3D object shapes from single-view images.
This task requires significant data acquisition to predict both visible and occluded portions of the shape.
We propose a continual learning-based 3D reconstruction method where our goal is to design a model using Variational Priors that can still reconstruct the previously seen classes reasonably even after training on new classes.
arXiv Detail & Related papers (2023-08-17T06:48:55Z) - Occupancy Planes for Single-view RGB-D Human Reconstruction [120.5818162569105]
Single-view RGB-D human reconstruction with implicit functions is often formulated as per-point classification.
We propose the occupancy planes (OPlanes) representation, which enables to formulate single-view RGB-D human reconstruction as occupancy prediction on planes which slice through the camera's view frustum.
arXiv Detail & Related papers (2022-08-04T17:59:56Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - Fusing Local Similarities for Retrieval-based 3D Orientation Estimation
of Unseen Objects [70.49392581592089]
We tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images.
We follow a retrieval-based strategy and prevent the network from learning object-specific features.
Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works.
arXiv Detail & Related papers (2022-03-16T08:53:00Z) - PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for
Piece-Wise Plane Detection and Reconstruction from a Single RGB Image [11.215334675788952]
Piece-wise 3D planar reconstruction provides holistic scene understanding of man-made environments, especially for indoor scenarios.
Most recent approaches focused on improving the segmentation and reconstruction results by introducing advanced network architectures.
We start from enforcing cross-task consistency for our multi-task convolutional neural network, PlaneRecNet.
We introduce several novel loss functions (geometric constraint) that jointly improve the accuracy of piece-wise planar segmentation and depth estimation.
arXiv Detail & Related papers (2021-10-21T15:54:03Z) - Campus3D: A Photogrammetry Point Cloud Benchmark for Hierarchical
Understanding of Outdoor Scene [76.4183572058063]
We present a richly-annotated 3D point cloud dataset for multiple outdoor scene understanding tasks.
The dataset has been point-wisely annotated with both hierarchical and instance-based labels.
We formulate a hierarchical learning problem for 3D point cloud segmentation and propose a measurement evaluating consistency across various hierarchies.
arXiv Detail & Related papers (2020-08-11T19:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.