Self-supervised 360$^{\circ}$ Room Layout Estimation
- URL: http://arxiv.org/abs/2203.16057v1
- Date: Wed, 30 Mar 2022 04:58:07 GMT
- Title: Self-supervised 360$^{\circ}$ Room Layout Estimation
- Authors: Hao-Wen Ting, Cheng Sun, Hwann-Tzong Chen
- Abstract summary: We present the first self-supervised method to train panoramic room layout estimation models without any labeled data.
Our approach also shows promising solutions in data-scarce scenarios and active learning, which would have an immediate value in real estate virtual tour software.
- Score: 20.062713286961326
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present the first self-supervised method to train panoramic room layout
estimation models without any labeled data. Unlike per-pixel dense depth that
provides abundant correspondence constraints, layout representation is sparse
and topological, hindering the use of self-supervised reprojection consistency
on images. To address this issue, we propose Differentiable Layout View
Rendering, which can warp a source image to the target camera pose given the
estimated layout from the target image. As each rendered pixel is
differentiable with respect to the estimated layout, we can now train the
layout estimation model by minimizing reprojection loss. Besides, we introduce
regularization losses to encourage Manhattan alignment, ceiling-floor
alignment, cycle consistency, and layout stretch consistency, which further
improve our predictions. Finally, we present the first self-supervised results
on ZilloIndoor and MatterportLayout datasets. Our approach also shows promising
solutions in data-scarce scenarios and active learning, which would have an
immediate value in the real estate virtual tour software. Code is available at
https://github.com/joshua049/Stereo-360-Layout.
Related papers
- 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation [147.81509219686419]
We propose a diagnostic benchmark for layout-guided image generation that examines four categories of spatial control skills: number, position, size, and shape.
Next, we propose IterInpaint, a new baseline that generates foreground and background regions step-by-step via inpainting.
We show comprehensive ablation studies on IterInpaint, including training task ratio, crop&paste vs. repaint, and generation order.
arXiv Detail & Related papers (2023-04-13T16:58:33Z) - Generalizable Person Re-Identification via Viewpoint Alignment and
Fusion [74.30861504619851]
This work proposes to use a 3D dense pose estimation model and a texture mapping module to map pedestrian images to canonical view images.
Due to the imperfection of the texture mapping module, the canonical view images may lose the discriminative detail clues from the original images.
We show that our method can lead to superior performance over the existing approaches in various evaluation settings.
arXiv Detail & Related papers (2022-12-05T16:24:09Z) - 360-MLC: Multi-view Layout Consistency for Self-training and
Hyper-parameter Tuning [40.93848397359068]
We present 360-MLC, a self-training method based on multi-view layout consistency for finetuning monocular room- models.
We leverage the entropy information in multiple layout estimations as a quantitative metric to measure the geometry consistency of the scene.
arXiv Detail & Related papers (2022-10-24T03:31:48Z) - Transferable End-to-end Room Layout Estimation via Implicit Encoding [34.99591465853653]
We study the problem of estimating room layouts from a single panorama image.
We propose an end-to-end method that directly predicts parametric layouts from an input panorama image.
arXiv Detail & Related papers (2021-12-21T16:33:14Z) - 360-DFPE: Leveraging Monocular 360-Layouts for Direct Floor Plan
Estimation [43.56963653723287]
We present 360-DFPE, a sequential floor plan estimation method that directly takes 360-images as input without relying on active sensors or 3D information.
Our results show that our monocular solution achieves favorable performance against the current state-of-the-art algorithms.
arXiv Detail & Related papers (2021-12-12T08:36:41Z) - SSH: A Self-Supervised Framework for Image Harmonization [97.16345684998788]
We propose a novel Self-Supervised Harmonization framework (SSH) that can be trained using just "free" natural images without being edited.
Our results show that the proposedSSH outperforms previous state-of-the-art methods in terms of reference metrics, visual quality, and subject user study.
arXiv Detail & Related papers (2021-08-15T19:51:33Z) - LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth
Rendering [59.63979143021241]
We formulate the task of 360 layout estimation as a problem of predicting depth on the horizon line of a panorama.
We propose the Differentiable Depth Rendering procedure to make the conversion from layout to depth prediction differentiable.
Our method achieves state-of-the-art performance on numerous 360 layout benchmark datasets.
arXiv Detail & Related papers (2021-04-01T15:48:41Z) - Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by
Implicitly Unprojecting to 3D [100.93808824091258]
We propose a new end-to-end architecture that directly extracts a bird's-eye-view representation of a scene given image data from an arbitrary number of cameras.
Our approach is to "lift" each image individually into a frustum of features for each camera, then "splat" all frustums into a bird's-eye-view grid.
We show that the representations inferred by our model enable interpretable end-to-end motion planning by "shooting" template trajectories into a bird's-eye-view cost map output by our network.
arXiv Detail & Related papers (2020-08-13T06:29:01Z) - MonoLayout: Amodal scene layout from a single image [12.466845447851377]
Given a single color image captured from a driving platform, we aim to predict the bird's-eye view layout of the road.
We dub this problem a scene layout estimation, which involves "hallucinating" scene layout.
To this end, we present Mono, a deep neural network for real-time amodal scene layout estimation.
arXiv Detail & Related papers (2020-02-19T19:16:34Z) - General 3D Room Layout from a Single View by Render-and-Compare [36.94817376590415]
We present a novel method to reconstruct the 3D layout of a room from a single perspective view.
Our dataset consists of 293 images from ScanNet, which we annotated with precise 3D layouts.
arXiv Detail & Related papers (2020-01-07T16:14:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.