PlaneRecTR: Unified Query Learning for 3D Plane Recovery from a Single
View
- URL: http://arxiv.org/abs/2307.13756v2
- Date: Thu, 17 Aug 2023 14:56:24 GMT
- Title: PlaneRecTR: Unified Query Learning for 3D Plane Recovery from a Single
View
- Authors: Jingjia Shi, Shuaifeng Zhi, Kai Xu
- Abstract summary: PlaneRecTR is a Transformer-based architecture that unifies all subtasks related to single-view plane recovery with a single compact model.
Our proposed unified learning achieves mutual benefits across subtasks, obtaining a new state-of-the-art performance on public ScanNet and NYUv2-Plane datasets.
- Score: 12.343189317320004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D plane recovery from a single image can usually be divided into several
subtasks of plane detection, segmentation, parameter estimation and possibly
depth estimation. Previous works tend to solve this task by either extending
the RCNN-based segmentation network or the dense pixel embedding-based
clustering framework. However, none of them tried to integrate above related
subtasks into a unified framework but treat them separately and sequentially,
which we suspect is potentially a main source of performance limitation for
existing approaches. Motivated by this finding and the success of query-based
learning in enriching reasoning among semantic entities, in this paper, we
propose PlaneRecTR, a Transformer-based architecture, which for the first time
unifies all subtasks related to single-view plane recovery with a single
compact model. Extensive quantitative and qualitative experiments demonstrate
that our proposed unified learning achieves mutual benefits across subtasks,
obtaining a new state-of-the-art performance on public ScanNet and NYUv2-Plane
datasets. Codes are available at https://github.com/SJingjia/PlaneRecTR.
Related papers
- Towards In-the-wild 3D Plane Reconstruction from a Single Image [16.857296782216206]
3D plane reconstruction from a single image is a crucial yet challenging topic in 3D computer vision.<n>Previous state-of-the-art methods have focused on training their system on a single dataset from either indoor or outdoor domain.<n>We introduce a novel framework dubbed ZeroPlane, a Transformer-based model targeting zero-shot 3D plane detection and reconstruction from a single image.
arXiv Detail & Related papers (2025-06-03T06:14:05Z) - Structure-Aware Correspondence Learning for Relative Pose Estimation [65.44234975976451]
Relative pose estimation provides a promising way for achieving object-agnostic pose estimation.
Existing 3D correspondence-based methods suffer from small overlaps in visible regions and unreliable feature estimation for invisible regions.
We propose a novel Structure-Aware Correspondence Learning method for Relative Pose Estimation, which consists of two key modules.
arXiv Detail & Related papers (2025-03-24T13:43:44Z) - Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model [15.892685514932323]
We introduce Plane-DUSt3R, a novel method for multi-view room layout estimation.
Plane-DUSt3R incorporates the DUSt3R framework and fine-tunes on a room layout dataset (Structure3D) with a modified objective to estimate structural planes.
By generating uniform and parsimonious results, Plane-DUSt3R enables room layout estimation with only a single post-processing step and 2D detection results.
arXiv Detail & Related papers (2025-02-24T02:14:19Z) - FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views [93.6881532277553]
We present FLARE, a feed-forward model designed to infer high-quality camera poses and 3D geometry from uncalibrated sparse-view images.
Our solution features a cascaded learning paradigm with camera pose serving as the critical bridge, recognizing its essential role in mapping 3D structures onto 2D image planes.
arXiv Detail & Related papers (2025-02-17T18:54:05Z) - UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image [86.7128543480229]
We present a novel approach and benchmark, termed UNOPose, for unseen one-reference-based object pose estimation.
Building upon a coarse-to-fine paradigm, UNOPose constructs an SE(3)-invariant reference frame to standardize object representation.
We recalibrate the weight of each correspondence based on its predicted likelihood of being within the overlapping region.
arXiv Detail & Related papers (2024-11-25T05:36:00Z) - MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction [37.481945507799594]
This paper presents a generalizable 3D plane detection and reconstruction framework named MonoPlane.
We first leverage large-scale pre-trained neural networks to obtain the depth and surface normals from a single image.
These monocular geometric cues are then incorporated into a proximity-guided RANSAC framework to sequentially fit each plane instance.
arXiv Detail & Related papers (2024-11-02T12:15:29Z) - UniPlane: Unified Plane Detection and Reconstruction from Posed Monocular Videos [12.328095228008893]
We present UniPlane, a novel method that unifies plane detection and reconstruction from posed monocular videos.
We build a Transformers-based deep neural network that jointly constructs a 3D feature volume for the environment.
Experiments on real-world datasets demonstrate that UniPlane outperforms state-of-the-art methods in both plane detection and reconstruction tasks.
arXiv Detail & Related papers (2024-07-04T03:02:27Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - Unifying Correspondence, Pose and NeRF for Pose-Free Novel View Synthesis from Stereo Pairs [57.492124844326206]
This work delves into the task of pose-free novel view synthesis from stereo pairs, a challenging and pioneering task in 3D vision.
Our innovative framework, unlike any before, seamlessly integrates 2D correspondence matching, camera pose estimation, and NeRF rendering, fostering a synergistic enhancement of these tasks.
arXiv Detail & Related papers (2023-12-12T13:22:44Z) - Multi-task Planar Reconstruction with Feature Warping Guidance [3.95944314850151]
Piece-wise planar 3D reconstruction simultaneously segments plane instances and recovers their 3D plane parameters from an image.
We introduce SOLOPlanes, a real-time planar reconstruction model based on a modified instance segmentation architecture.
Our model simultaneously predicts semantics using single images at inference time, while achieving real-time predictions at 43 FPS.
arXiv Detail & Related papers (2023-11-25T09:53:42Z) - RelPose++: Recovering 6D Poses from Sparse-view Observations [66.6922660401558]
We address the task of estimating 6D camera poses from sparse-view image sets (2-8 images)
We build on the recent RelPose framework which learns a network that infers distributions over relative rotations over image pairs.
Our final system results in large improvements in 6D pose prediction over prior art on both seen and unseen object categories.
arXiv Detail & Related papers (2023-05-08T17:59:58Z) - NOPE-SAC: Neural One-Plane RANSAC for Sparse-View Planar 3D
Reconstruction [41.00845324937751]
This paper studies the challenging two-view 3D reconstruction in a rigorous sparse-view configuration.
We present a novel Neural One-PlanE RANSAC framework that exerts excellent capability to learn one-plane pose hypotheses.
arXiv Detail & Related papers (2022-11-30T07:33:14Z) - Occupancy Planes for Single-view RGB-D Human Reconstruction [120.5818162569105]
Single-view RGB-D human reconstruction with implicit functions is often formulated as per-point classification.
We propose the occupancy planes (OPlanes) representation, which enables to formulate single-view RGB-D human reconstruction as occupancy prediction on planes which slice through the camera's view frustum.
arXiv Detail & Related papers (2022-08-04T17:59:56Z) - PlaneRecNet: Multi-Task Learning with Cross-Task Consistency for
Piece-Wise Plane Detection and Reconstruction from a Single RGB Image [11.215334675788952]
Piece-wise 3D planar reconstruction provides holistic scene understanding of man-made environments, especially for indoor scenarios.
Most recent approaches focused on improving the segmentation and reconstruction results by introducing advanced network architectures.
We start from enforcing cross-task consistency for our multi-task convolutional neural network, PlaneRecNet.
We introduce several novel loss functions (geometric constraint) that jointly improve the accuracy of piece-wise planar segmentation and depth estimation.
arXiv Detail & Related papers (2021-10-21T15:54:03Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.