iBARLE: imBalance-Aware Room Layout Estimation
- URL: http://arxiv.org/abs/2308.15050v1
- Date: Tue, 29 Aug 2023 06:20:36 GMT
- Title: iBARLE: imBalance-Aware Room Layout Estimation
- Authors: Taotao Jing, Lichen Wang, Naji Khosravan, Zhiqiang Wan, Zachary
Bessinger, Zhengming Ding, Sing Bing Kang
- Abstract summary: Room layout estimation predicts layouts from a single panorama.
There are significant imbalances in real-world datasets including the dimensions of layout complexity, camera locations, and variation in scene appearance.
We propose imBalance-Aware Room Layout Estimation (iBARLE) framework to address these issues.
iBARLE consists of (1) Appearance Variation Generation (AVG) module, (2) Complex Structure Mix-up (CSMix) module, which enhances generalizability w.r.t. room structure, and (3) a gradient-based layout objective function.
- Score: 54.819085005591894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Room layout estimation predicts layouts from a single panorama. It requires
datasets with large-scale and diverse room shapes to train the models. However,
there are significant imbalances in real-world datasets including the
dimensions of layout complexity, camera locations, and variation in scene
appearance. These issues considerably influence the model training performance.
In this work, we propose the imBalance-Aware Room Layout Estimation (iBARLE)
framework to address these issues. iBARLE consists of (1) Appearance Variation
Generation (AVG) module, which promotes visual appearance domain
generalization, (2) Complex Structure Mix-up (CSMix) module, which enhances
generalizability w.r.t. room structure, and (3) a gradient-based layout
objective function, which allows more effective accounting for occlusions in
complex layouts. All modules are jointly trained and help each other to achieve
the best performance. Experiments and ablation studies based on
ZInD~\cite{cruz2021zillow} dataset illustrate that iBARLE has state-of-the-art
performance compared with other layout estimation baselines.
Related papers
- DeBaRA: Denoising-Based 3D Room Arrangement Generation [22.96293773013579]
We introduce DeBaRA, a score-based model specifically tailored for precise, controllable and flexible arrangement generation in a bounded environment.
We demonstrate that by focusing on spatial attributes of objects, a single trained DeBaRA model can be leveraged at test time to perform several downstream applications such as scene synthesis, completion and re-arrangement.
arXiv Detail & Related papers (2024-09-26T23:18:25Z) - GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model [66.35608254724566]
State-space models (SSMs) have showcased effective performance in modeling long-range dependencies with subquadratic complexity.
However, pure SSM-based models still face challenges related to stability and achieving optimal performance on computer vision tasks.
Our paper addresses the challenges of scaling SSM-based models for computer vision, particularly the instability and inefficiency of large model sizes.
arXiv Detail & Related papers (2024-07-18T17:59:58Z) - Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model architecture.
We represent a scene layout by a 2D floor plan and a set of objects, each defined by its category, location, size, and orientation.
Our experimental results demonstrate that MiDiffusion substantially outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - Towards Robust and Expressive Whole-body Human Pose and Shape Estimation [51.457517178632756]
Whole-body pose and shape estimation aims to jointly predict different behaviors of the entire human body from a monocular image.
Existing methods often exhibit degraded performance under the complexity of in-the-wild scenarios.
We propose a novel framework to enhance the robustness of whole-body pose and shape estimation.
arXiv Detail & Related papers (2023-12-14T08:17:42Z) - RoomDesigner: Encoding Anchor-latents for Style-consistent and
Shape-compatible Indoor Scene Generation [26.906174238830474]
Indoor scene generation aims at creating shape-compatible, style-consistent furniture arrangements within a spatially reasonable layout.
We propose a two-stage model integrating shape priors into the indoor scene generation by encoding furniture as anchor latent representations.
arXiv Detail & Related papers (2023-10-16T03:05:19Z) - Towards Unseen Triples: Effective Text-Image-joint Learning for Scene
Graph Generation [30.79358827005448]
Scene Graph Generation (SGG) aims to structurally and comprehensively represent objects and their connections in images.
Existing SGG models often struggle to solve the long-tailed problem caused by biased datasets.
We propose a Text-Image-joint Scene Graph Generation (TISGG) model to resolve the unseen triples and improve the generalisation capability of the SGG models.
arXiv Detail & Related papers (2023-06-23T10:17:56Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - End-to-end Generative Floor-plan and Layout with Attributes and Relation
Graph [6.259404056725123]
We propose an end-end model for producing furniture layout for interior scene synthesis from the random vector.
The proposed model combines a conditional floor-plan module of the room, a conditional graphical floor-plan module of the room and a conditional layout module.
We conduct our experiments on the proposed real-world interior layout dataset that contains $191208$ designs from the professional designers.
arXiv Detail & Related papers (2020-12-15T07:37:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.