PanoMixSwap Panorama Mixing via Structural Swapping for Indoor Scene
Understanding
- URL: http://arxiv.org/abs/2309.09514v2
- Date: Wed, 27 Sep 2023 04:32:41 GMT
- Title: PanoMixSwap Panorama Mixing via Structural Swapping for Indoor Scene
Understanding
- Authors: Yu-Cheng Hsieh, Cheng Sun, Suraj Dengale, Min Sun
- Abstract summary: PanoMixSwap is a novel data augmentation technique specifically designed for indoor panoramic images.
We decompose each panoramic image into its constituent parts: background style, foreground furniture, and room layout.
We generate an augmented image by mixing these three parts from three different images, such as the foreground furniture from one image, the background style from another image, and the room structure from the third image.
- Score: 14.489840196199882
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The volume and diversity of training data are critical for modern deep
learningbased methods. Compared to the massive amount of labeled perspective
images, 360 panoramic images fall short in both volume and diversity. In this
paper, we propose PanoMixSwap, a novel data augmentation technique specifically
designed for indoor panoramic images. PanoMixSwap explicitly mixes various
background styles, foreground furniture, and room layouts from the existing
indoor panorama datasets and generates a diverse set of new panoramic images to
enrich the datasets. We first decompose each panoramic image into its
constituent parts: background style, foreground furniture, and room layout.
Then, we generate an augmented image by mixing these three parts from three
different images, such as the foreground furniture from one image, the
background style from another image, and the room structure from the third
image. Our method yields high diversity since there is a cubical increase in
image combinations. We also evaluate the effectiveness of PanoMixSwap on two
indoor scene understanding tasks: semantic segmentation and layout estimation.
Our experiments demonstrate that state-of-the-art methods trained with
PanoMixSwap outperform their original setting on both tasks consistently.
Related papers
- DiffPano: Scalable and Consistent Text to Panorama Generation with Spherical Epipolar-Aware Diffusion [60.45000652592418]
We propose a novel text-driven panoramic generation framework, DiffPano, to achieve scalable, consistent, and diverse panoramic scene generation.
We show that DiffPano can generate consistent, diverse panoramic images with given unseen text descriptions and camera poses.
arXiv Detail & Related papers (2024-10-31T17:57:02Z) - Taming Stable Diffusion for Text to 360° Panorama Image Generation [74.69314801406763]
We introduce a novel dual-branch diffusion model named PanFusion to generate a 360-degree image from a text prompt.
We propose a unique cross-attention mechanism with projection awareness to minimize distortion during the collaborative denoising process.
arXiv Detail & Related papers (2024-04-11T17:46:14Z) - PanoGen: Text-Conditioned Panoramic Environment Generation for
Vision-and-Language Navigation [96.8435716885159]
Vision-and-Language Navigation (VLN) requires the agent to follow language instructions to navigate through 3D environments.
One main challenge in VLN is the limited availability of training environments, which makes it hard to generalize to new and unseen environments.
We propose PanoGen, a generation method that can potentially create an infinite number of diverse panoramic environments conditioned on text.
arXiv Detail & Related papers (2023-05-30T16:39:54Z) - PanoContext-Former: Panoramic Total Scene Understanding with a
Transformer [37.51637352106841]
Panoramic image enables deeper understanding and more holistic perception of $360circ$ surrounding environment.
In this paper, we propose a novel method using depth prior for holistic indoor scene understanding.
In addition, we introduce a real-world dataset for scene understanding, including photo-realistic panoramas, high-fidelity depth images, accurately annotated room layouts, and oriented object bounding boxes and shapes.
arXiv Detail & Related papers (2023-05-21T16:20:57Z) - PanoViT: Vision Transformer for Room Layout Estimation from a Single
Panoramic Image [11.053777620735175]
PanoViT is a panorama vision transformer to estimate the room layout from a single panoramic image.
Compared to CNN models, our PanoViT is more proficient in learning global information from the panoramic image.
Our method outperforms state-of-the-art solutions in room layout prediction accuracy.
arXiv Detail & Related papers (2022-12-23T05:37:11Z) - DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene
Context Graph and Relation-based Optimization [66.25948693095604]
We propose a novel method for panoramic 3D scene understanding which recovers the 3D room layout and the shape, pose, position, and semantic category for each object from a single full-view panorama image.
Experiments demonstrate that our method outperforms existing methods on panoramic scene understanding in terms of both geometry accuracy and object arrangement.
arXiv Detail & Related papers (2021-08-24T13:55:29Z) - Panoramic Panoptic Segmentation: Towards Complete Surrounding
Understanding via Unsupervised Contrastive Learning [97.37544023666833]
We introduce panoramic panoptic segmentation as the most holistic scene understanding.
A complete surrounding understanding provides a maximum of information to the agent.
We propose a framework which allows model training on standard pinhole images and transfers the learned features to a different domain.
arXiv Detail & Related papers (2021-03-01T09:37:27Z) - TediGAN: Text-Guided Diverse Face Image Generation and Manipulation [52.83401421019309]
TediGAN is a framework for multi-modal image generation and manipulation with textual descriptions.
StyleGAN inversion module maps real images to the latent space of a well-trained StyleGAN.
visual-linguistic similarity learns the text-image matching by mapping the image and text into a common embedding space.
instance-level optimization is for identity preservation in manipulation.
arXiv Detail & Related papers (2020-12-06T16:20:19Z) - Scene Image Representation by Foreground, Background and Hybrid Features [17.754713956659188]
We propose to use hybrid features in addition to foreground and background features to represent scene images.
Our method produces the state-of-the-art classification performance.
arXiv Detail & Related papers (2020-06-05T01:55:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.