RoomStructNet: Learning to Rank Non-Cuboidal Room Layouts From Single
View
- URL: http://arxiv.org/abs/2110.00644v1
- Date: Fri, 1 Oct 2021 20:42:49 GMT
- Title: RoomStructNet: Learning to Rank Non-Cuboidal Room Layouts From Single
View
- Authors: Xi Zhang, Chun-Kai Wang, Kenan Deng, Tomas Yago-Vicente, Himanshu
Arora
- Abstract summary: We present a new approach to estimate the layout of a room from its single image.
Our approach learns an additional ranking function to estimate the final layout instead of using optimization.
Our approach shows state-of-the-art results on standard datasets with mostly cuboidal layouts and also performs well on a dataset containing rooms with non-cuboidal layouts.
- Score: 7.427006214471801
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we present a new approach to estimate the layout of a room
from its single image. While recent approaches for this task use robust
features learnt from data, they resort to optimization for detecting the final
layout. In addition to using learnt robust features, our approach learns an
additional ranking function to estimate the final layout instead of using
optimization. To learn this ranking function, we propose a framework to train a
CNN using max-margin structure cost. Also, while most approaches aim at
detecting cuboidal layouts, our approach detects non-cuboidal layouts for which
we explicitly estimates layout complexity parameters. We use these parameters
to propose layout candidates in a novel way. Our approach shows
state-of-the-art results on standard datasets with mostly cuboidal layouts and
also performs well on a dataset containing rooms with non-cuboidal layouts.
Related papers
- 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - CPPF++: Uncertainty-Aware Sim2Real Object Pose Estimation by Vote Aggregation [67.12857074801731]
We introduce a novel method, CPPF++, designed for sim-to-real pose estimation.
To address the challenge posed by vote collision, we propose a novel approach that involves modeling the voting uncertainty.
We incorporate several innovative modules, including noisy pair filtering, online alignment optimization, and a feature ensemble.
arXiv Detail & Related papers (2022-11-24T03:27:00Z) - ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich
Document Understanding [52.3895498789521]
We propose ERNIE, a novel document pre-training solution with layout knowledge enhancement.
We first rearrange input sequences in the serialization stage, then present a correlative pre-training task, reading order prediction, and learn the proper reading order of documents.
Experimental results show ERNIE achieves superior performance on various downstream tasks, setting new state-of-the-art on key information, and document question answering.
arXiv Detail & Related papers (2022-10-12T12:59:24Z) - MCTS with Refinement for Proposals Selection Games in Scene
Understanding [32.92475660892122]
We propose a novel method applicable in many scene understanding problems that adapts the Monte Carlo Tree Search (MCTS) algorithm.
From a generated pool of proposals, our method jointly selects and optimize proposals that maximize the objective term.
Our method shows high performance on the Matterport3D dataset without introducing hard constraints on room layout configurations.
arXiv Detail & Related papers (2022-07-07T10:15:54Z) - Learning Canonical Embedding for Non-rigid Shape Matching [36.85782408336389]
This paper provides a novel framework that learns canonical embeddings for non-rigid shape matching.
Our framework is trained end-to-end and thus avoids instabilities and constraints associated with the commonly-used Laplace-Beltrami basis.
arXiv Detail & Related papers (2021-10-06T18:09:13Z) - Graph-Embedded Subspace Support Vector Data Description [98.78559179013295]
We propose a novel subspace learning framework for one-class classification.
The proposed framework presents the problem in the form of graph embedding.
We demonstrate improved performance against the baselines and the recently proposed subspace learning methods for one-class classification.
arXiv Detail & Related papers (2021-04-29T14:30:48Z) - RackLay: Multi-Layer Layout Estimation for Warehouse Racks [17.937062635570268]
We present RackLay, a deep neural network for real-time shelf layout estimation from a single image.
RackLay estimates the top-view and front-view layout for each shelf in the considered rack populated with objects.
We also show that fusing the top-view and front-view enables 3D reasoning applications such as metric free space estimation for the considered rack.
arXiv Detail & Related papers (2021-03-16T16:22:31Z) - Scene Graph to Image Generation with Contextualized Object Layout
Refinement [92.85331019618332]
We propose a novel method to generate images from scene graphs.
Our approach improves the layout coverage by almost 20 points and drops object overlap to negligible amounts.
arXiv Detail & Related papers (2020-09-23T06:27:54Z) - DrNAS: Dirichlet Neural Architecture Search [88.56953713817545]
We treat the continuously relaxed architecture mixing weight as random variables, modeled by Dirichlet distribution.
With recently developed pathwise derivatives, the Dirichlet parameters can be easily optimized with gradient-based generalization.
To alleviate the large memory consumption of differentiable NAS, we propose a simple yet effective progressive learning scheme.
arXiv Detail & Related papers (2020-06-18T08:23:02Z) - Scan2Plan: Efficient Floorplan Generation from 3D Scans of Indoor Scenes [9.71137838903781]
Scan2Plan is a novel approach for accurate estimation of a floorplan from a 3D scan of the structural elements of indoor environments.
The proposed method incorporates a two-stage approach where the initial stage clusters an unordered point cloud representation of the scene.
The subsequent stage estimates a closed perimeter, parameterized by a simple polygon, for each individual room.
The final floorplan is simply an assembly of all such room perimeters in the global co-ordinate system.
arXiv Detail & Related papers (2020-03-16T17:59:41Z) - General 3D Room Layout from a Single View by Render-and-Compare [36.94817376590415]
We present a novel method to reconstruct the 3D layout of a room from a single perspective view.
Our dataset consists of 293 images from ScanNet, which we annotated with precise 3D layouts.
arXiv Detail & Related papers (2020-01-07T16:14:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.