DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene
Context Graph and Relation-based Optimization
- URL: http://arxiv.org/abs/2108.10743v1
- Date: Tue, 24 Aug 2021 13:55:29 GMT
- Title: DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene
Context Graph and Relation-based Optimization
- Authors: Cheng Zhang, Zhaopeng Cui, Cai Chen, Shuaicheng Liu, Bing Zeng, Hujun
Bao, Yinda Zhang
- Abstract summary: We propose a novel method for panoramic 3D scene understanding which recovers the 3D room layout and the shape, pose, position, and semantic category for each object from a single full-view panorama image.
Experiments demonstrate that our method outperforms existing methods on panoramic scene understanding in terms of both geometry accuracy and object arrangement.
- Score: 66.25948693095604
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Panorama images have a much larger field-of-view thus naturally encode
enriched scene context information compared to standard perspective images,
which however is not well exploited in the previous scene understanding
methods. In this paper, we propose a novel method for panoramic 3D scene
understanding which recovers the 3D room layout and the shape, pose, position,
and semantic category for each object from a single full-view panorama image.
In order to fully utilize the rich context information, we design a novel graph
neural network based context model to predict the relationship among objects
and room layout, and a differentiable relationship-based optimization module to
optimize object arrangement with well-designed objective functions on-the-fly.
Realizing the existing data are either with incomplete ground truth or
overly-simplified scene, we present a new synthetic dataset with good diversity
in room layout and furniture placement, and realistic image quality for total
panoramic 3D scene understanding. Experiments demonstrate that our method
outperforms existing methods on panoramic scene understanding in terms of both
geometry accuracy and object arrangement. Code is available at
https://chengzhag.github.io/publication/dpc.
Related papers
- Disentangled 3D Scene Generation with Layout Learning [109.03233745767062]
We introduce a method to generate 3D scenes that are disentangled into their component objects.
Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene.
We show that despite its simplicity, our approach successfully generates 3D scenes into individual objects.
arXiv Detail & Related papers (2024-02-26T18:54:15Z) - SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets.
We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z) - GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs [74.98581417902201]
We propose a novel framework to generate compositional 3D scenes from scene graphs.
By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model.
We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer.
arXiv Detail & Related papers (2023-11-30T18:59:58Z) - PanoContext-Former: Panoramic Total Scene Understanding with a
Transformer [37.51637352106841]
Panoramic image enables deeper understanding and more holistic perception of $360circ$ surrounding environment.
In this paper, we propose a novel method using depth prior for holistic indoor scene understanding.
In addition, we introduce a real-world dataset for scene understanding, including photo-realistic panoramas, high-fidelity depth images, accurately annotated room layouts, and oriented object bounding boxes and shapes.
arXiv Detail & Related papers (2023-05-21T16:20:57Z) - Neural Rendering in a Room: Amodal 3D Understanding and Free-Viewpoint
Rendering for the Closed Scene Composed of Pre-Captured Objects [40.59508249969956]
We present a novel solution to mimic such human perception capability based on a new paradigm of amodal 3D scene understanding with neural rendering for a closed scene.
We first learn the prior knowledge of the objects in a closed scene via an offline stage, which facilitates an online stage to understand the room with unseen furniture arrangement.
During the online stage, given a panoramic image of the scene in different layouts, we utilize a holistic neural-rendering-based optimization framework to efficiently estimate the correct 3D scene layout and deliver realistic free-viewpoint rendering.
arXiv Detail & Related papers (2022-05-05T15:34:09Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z) - Generative View Synthesis: From Single-view Semantics to Novel-view
Images [38.7873192939574]
Generative View Synthesis (GVS) can synthesize multiple photorealistic views of a scene given a single semantic map.
We first lift the input 2D semantic map onto a 3D layered representation of the scene in feature space.
We then project the layered features onto the target views to generate the final novel-view images.
arXiv Detail & Related papers (2020-08-20T17:48:16Z) - Perspective Plane Program Induction from a Single Image [85.28956922100305]
We study the inverse graphics problem of inferring a holistic representation for natural images.
We formulate this problem as jointly finding the camera pose and scene structure that best describe the input image.
Our proposed framework, Perspective Plane Program Induction (P3I), combines search-based and gradient-based algorithms to efficiently solve the problem.
arXiv Detail & Related papers (2020-06-25T21:18:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.