Related papers: uLayout: Unified Room Layout Estimation for Perspective and Panoramic Images

uLayout: Unified Room Layout Estimation for Perspective and Panoramic Images

URL: http://arxiv.org/abs/2503.21562v1
Date: Thu, 27 Mar 2025 14:47:05 GMT
Title: uLayout: Unified Room Layout Estimation for Perspective and Panoramic Images
Authors: Jonathan Lee, Bolivar Solarte, Chin-Hsuan Wu, Jin-Cheng Jhang, Fu-En Wang, Yi-Hsuan Tsai, Min Sun,
Abstract summary: We present uvolution, a unified model for estimating room layout from both perspective and panoramic images.<n>The key idea of our solution is to unify both domains into the equirectangular projection.<n>We show for the first time a single end-to-end model for both domains.
Score: 29.336666024601545
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We present uLayout, a unified model for estimating room layout geometries from both perspective and panoramic images, whereas traditional solutions require different model designs for each image type. The key idea of our solution is to unify both domains into the equirectangular projection, particularly, allocating perspective images into the most suitable latitude coordinate to effectively exploit both domains seamlessly. To address the Field-of-View (FoV) difference between the input domains, we design uLayout with a shared feature extractor with an extra 1D-Convolution layer to condition each domain input differently. This conditioning allows us to efficiently formulate a column-wise feature regression problem regardless of the FoV input. This simple yet effective approach achieves competitive performance with current state-of-the-art solutions and shows for the first time a single end-to-end model for both domains. Extensive experiments in the real-world datasets, LSUN, Matterport3D, PanoContext, and Stanford 2D-3D evidence the contribution of our approach. Code is available at https://github.com/JonathanLee112/uLayout.

Related papers

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation [78.21134311493303]
Diffusion models have been recognized for their ability to generate images that are not only visually appealing but also of high artistic quality.<n>Previous methods primarily focus on UNet-based models (e.g., SD1.5 and SDXL), and limited effort has explored Multimodal Diffusion Transformers (MM-DiTs)<n>Inherit the advantages of MM-DiT, we use a separate set network weights to process the image and text modalities.<n>We contribute a large-scale layout dataset, named LayoutSAM, which includes 2.7 million image-text pairs and 10.7 million entities.
arXiv Detail & Related papers (2024-12-05T04:09:47Z)
WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization [63.98650220772378]
We present WIDIn, Wording Images for Domain-Invariant representation, to disentangle discriminative visual representation. We first estimate the language embedding with fine-grained alignment, which can be used to adaptively identify and then remove domain-specific counterpart. We show that WIDIn can be applied to both pretrained vision-language models like CLIP, and separately trained uni-modal models like MoCo and BERT.
arXiv Detail & Related papers (2024-05-28T17:46:27Z)
ZoDi: Zero-Shot Domain Adaptation with Diffusion-Based Image Transfer [13.956618446530559]
This paper proposes a zero-shot domain adaptation method based on diffusion models, called ZoDi. First, we utilize an off-the-shelf diffusion model to synthesize target-like images by transferring the domain of source images to the target domain. Secondly, we train the model using both source images and synthesized images with the original representations to learn domain-robust representations.
arXiv Detail & Related papers (2024-03-20T14:58:09Z)
360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results. We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics. We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations. Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z)
PosterLayout: A New Benchmark and Approach for Content-aware Visual-Textual Presentation Layout [62.12447593298437]
Content-aware visual-textual presentation layout aims at arranging spatial space on the given canvas for pre-defined elements. We propose design sequence formation (DSF) that reorganizes elements in layouts to imitate the design processes of human designers. A novel CNN-LSTM-based conditional generative adversarial network (GAN) is presented to generate proper layouts.
arXiv Detail & Related papers (2023-03-28T12:48:36Z)
GPR-Net: Multi-view Layout Estimation via a Geometry-aware Panorama Registration Network [44.06968418800436]
We present a complete panoramic layout estimation framework that jointly learns panorama registration and layout estimation given a pair of panoramas. The major improvement over PSMNet comes from a novel Geometry-aware Panorama Registration Network or GPR-Net. Experimental results indicate that our method achieves state-of-the-art performance in both panorama registration and layout estimation on a large-scale indoor panorama dataset ZInD.
arXiv Detail & Related papers (2022-10-20T17:10:41Z)
Geometry Aligned Variational Transformer for Image-conditioned Layout Generation [38.747175229902396]
We propose an Image-Conditioned Variational Transformer (ICVT) that autoregressively generates various layouts in an image. First, self-attention mechanism is adopted to model the contextual relationship within layout elements, while cross-attention mechanism is used to fuse the visual information of conditional images. We construct a large-scale advertisement poster layout designing dataset with delicate layout and saliency map annotations.
arXiv Detail & Related papers (2022-09-02T07:19:12Z)
Self-supervised 360$^{\circ}$ Room Layout Estimation [20.062713286961326]
We present the first self-supervised method to train panoramic room layout estimation models without any labeled data. Our approach also shows promising solutions in data-scarce scenarios and active learning, which would have an immediate value in real estate virtual tour software.
arXiv Detail & Related papers (2022-03-30T04:58:07Z)
PIT: Position-Invariant Transform for Cross-FoV Domain Adaptation [53.428312630479816]
We observe that the Field of View (FoV) gap induces noticeable instance appearance differences between the source and target domains. Motivated by the observations, we propose the textbfPosition-Invariant Transform (PIT) to better align images in different domains.
arXiv Detail & Related papers (2021-08-16T15:16:47Z)
UniFuse: Unidirectional Fusion for 360$^{\circ}$ Panorama Depth Estimation [11.680475784102308]
This paper introduces a new framework to fuse features from the two projections, unidirectionally feeding the cubemap features to the equirectangular features only at the decoding stage. Experiments verify the effectiveness of our proposed fusion strategy and module, and our model achieves state-of-the-art performance on four popular datasets.
arXiv Detail & Related papers (2021-02-06T10:01:09Z)
LayoutTransformer: Layout Generation and Completion with Self-attention [105.21138914859804]
We address the problem of scene layout generation for diverse domains such as images, mobile applications, documents, and 3D objects. We propose LayoutTransformer, a novel framework that leverages self-attention to learn contextual relationships between layout elements. Our framework allows us to generate a new layout either from an empty set or from an initial seed set of primitives, and can easily scale to support an arbitrary of primitives per layout.
arXiv Detail & Related papers (2020-06-25T17:56:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.