Monocular Spherical Depth Estimation with Explicitly Connected Weak
Layout Cues
- URL: http://arxiv.org/abs/2206.11358v1
- Date: Wed, 22 Jun 2022 20:10:45 GMT
- Title: Monocular Spherical Depth Estimation with Explicitly Connected Weak
Layout Cues
- Authors: Nikolaos Zioulis, Federico Alvarez, Dimitrios Zarpalas, Petros Daras
- Abstract summary: We generate a geometric vision (360V) dataset that includes multiple modalities, multi-view stereo data and automatically generated weak layout cues.
We rely on depth-based layout reconstruction and layout-based depth attention, demonstrating increased performance across both tasks.
By using single 360 cameras to scan rooms, the opportunity for facile and quick building-scale 3D scanning arises.
- Score: 27.15511982413305
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Spherical cameras capture scenes in a holistic manner and have been used for
room layout estimation. Recently, with the availability of appropriate
datasets, there has also been progress in depth estimation from a single
omnidirectional image. While these two tasks are complementary, few works have
been able to explore them in parallel to advance indoor geometric perception,
and those that have done so either relied on synthetic data, or used small
scale datasets, as few options are available that include both layout
annotations and dense depth maps in real scenes. This is partly due to the
necessity of manual annotations for room layouts. In this work, we move beyond
this limitation and generate a 360 geometric vision (360V) dataset that
includes multiple modalities, multi-view stereo data and automatically
generated weak layout cues. We also explore an explicit coupling between the
two tasks to integrate them into a singleshot trained model. We rely on
depth-based layout reconstruction and layout-based depth attention,
demonstrating increased performance across both tasks. By using single 360
cameras to scan rooms, the opportunity for facile and quick building-scale 3D
scanning arises.
Related papers
- 360 in the Wild: Dataset for Depth Prediction and View Synthesis [66.58513725342125]
We introduce a large scale 360$circ$ videos dataset in the wild.
This dataset has been carefully scraped from the Internet and has been captured from various locations worldwide.
Each of the 25K images constituting our dataset is provided with its respective camera's pose and depth map.
arXiv Detail & Related papers (2024-06-27T05:26:38Z) - DUSt3R: Geometric 3D Vision Made Easy [8.471330244002564]
We introduce DUSt3R, a novel paradigm for Dense and Unconstrained Stereo 3D Reconstruction of arbitrary image collections.
We show that this formulation smoothly unifies the monocular and binocular reconstruction cases.
Our formulation directly provides a 3D model of the scene as well as depth information, but interestingly, we can seamlessly recover from it, pixel matches, relative and absolute camera.
arXiv Detail & Related papers (2023-12-21T18:52:14Z) - Enhancing Egocentric 3D Pose Estimation with Third Person Views [37.9683439632693]
We propose a novel approach to enhance the 3D body pose estimation of a person computed from videos captured from a single wearable camera.
We introduce First2Third-Pose, a new paired synchronized dataset of nearly 2,000 videos depicting human activities captured from both first- and third-view perspectives.
Experimental results demonstrate that the joint multi-view embedded space learned with our dataset is useful to extract discriminatory features from arbitrary single-view egocentric videos.
arXiv Detail & Related papers (2022-01-06T11:42:01Z) - 360-DFPE: Leveraging Monocular 360-Layouts for Direct Floor Plan
Estimation [43.56963653723287]
We present 360-DFPE, a sequential floor plan estimation method that directly takes 360-images as input without relying on active sensors or 3D information.
Our results show that our monocular solution achieves favorable performance against the current state-of-the-art algorithms.
arXiv Detail & Related papers (2021-12-12T08:36:41Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth
Rendering [59.63979143021241]
We formulate the task of 360 layout estimation as a problem of predicting depth on the horizon line of a panorama.
We propose the Differentiable Depth Rendering procedure to make the conversion from layout to depth prediction differentiable.
Our method achieves state-of-the-art performance on numerous 360 layout benchmark datasets.
arXiv Detail & Related papers (2021-04-01T15:48:41Z) - SSLayout360: Semi-Supervised Indoor Layout Estimation from 360-Degree
Panorama [0.0]
We propose the first approach to learn representations of room corners and boundaries by using a combination of labeled and unlabeled data.
Our approach can advance layout estimation of complex indoor scenes using as few as 20 labeled examples.
arXiv Detail & Related papers (2021-03-25T09:19:13Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z) - LayoutMP3D: Layout Annotation of Matterport3D [59.11106101006007]
We consider the Matterport3D dataset with their originally provided depth map ground truths and further release our annotations for layout ground truths from a subset of Matterport3D.
Our dataset provides both the layout and depth information, which enables the opportunity to explore the environment by integrating both cues.
arXiv Detail & Related papers (2020-03-30T14:40:56Z) - Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object.
We first estimate per-view depth maps using a deep multi-view stereo network.
These depth maps are used to coarsely align the different views.
We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.