SceneGen: Generative Contextual Scene Augmentation using Scene Graph
Priors
- URL: http://arxiv.org/abs/2009.12395v2
- Date: Wed, 30 Sep 2020 17:06:05 GMT
- Title: SceneGen: Generative Contextual Scene Augmentation using Scene Graph
Priors
- Authors: Mohammad Keshavarzi, Aakash Parikh, Xiyu Zhai, Melody Mao, Luisa
Caldas, Allen Y. Yang
- Abstract summary: We introduce SceneGen, a generative contextual augmentation framework that predicts virtual object positions and orientations within existing scenes.
SceneGen takes a semantically segmented scene as input, and outputs positional and orientational probability maps for placing virtual content.
We formulate a novel spatial Scene Graph representation, which encapsulates explicit topological properties between objects, object groups, and the room.
To demonstrate our system in action, we develop an Augmented Reality application, in which objects can be contextually augmented in real-time.
- Score: 3.1969855247377827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spatial computing experiences are constrained by the real-world surroundings
of the user. In such experiences, augmenting virtual objects to existing scenes
require a contextual approach, where geometrical conflicts are avoided, and
functional and plausible relationships to other objects are maintained in the
target environment. Yet, due to the complexity and diversity of user
environments, automatically calculating ideal positions of virtual content that
is adaptive to the context of the scene is considered a challenging task.
Motivated by this problem, in this paper we introduce SceneGen, a generative
contextual augmentation framework that predicts virtual object positions and
orientations within existing scenes. SceneGen takes a semantically segmented
scene as input, and outputs positional and orientational probability maps for
placing virtual content. We formulate a novel spatial Scene Graph
representation, which encapsulates explicit topological properties between
objects, object groups, and the room. We believe providing explicit and
intuitive features plays an important role in informative content creation and
user interaction of spatial computing settings, a quality that is not captured
in implicit models. We use kernel density estimation (KDE) to build a
multivariate conditional knowledge model trained using prior spatial Scene
Graphs extracted from real-world 3D scanned data. To further capture
orientational properties, we develop a fast pose annotation tool to extend
current real-world datasets with orientational labels. Finally, to demonstrate
our system in action, we develop an Augmented Reality application, in which
objects can be contextually augmented in real-time.
Related papers
- Context-Aware Indoor Point Cloud Object Generation through User Instructions [6.398660996031915]
We present a novel end-to-end multi-modal deep neural network capable of generating point cloud objects seamlessly integrated with their surroundings.
Our model revolutionizes scene modification by enabling the creation of new environments with previously unseen object layouts.
arXiv Detail & Related papers (2023-11-26T06:40:16Z) - AnyDoor: Zero-shot Object-level Image Customization [63.44307304097742]
This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations.
Our model is trained only once and effortlessly generalizes to diverse object-scene combinations at the inference stage.
arXiv Detail & Related papers (2023-07-18T17:59:02Z) - CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph
Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes.
Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes.
The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z) - A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented,
Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles.
Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z) - Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators [97.12135238534628]
We propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects.
Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts.
Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks.
arXiv Detail & Related papers (2022-12-13T01:36:56Z) - Sim-To-Real Transfer of Visual Grounding for Human-Aided Ambiguity
Resolution [0.0]
We consider the task of visual grounding, where the agent segments an object from a crowded scene given a natural language description.
Modern holistic approaches to visual grounding usually ignore language structure and struggle to cover generic domains.
We introduce a fully decoupled modular framework for compositional visual grounding of entities, attributes, and spatial relations.
arXiv Detail & Related papers (2022-05-24T14:12:32Z) - Continuous Scene Representations for Embodied AI [33.00565252990522]
Continuous Scene Representations (CSR) is a scene representation constructed by an embodied agent navigating within a space.
Our key insight is to embed pair-wise relationships between objects in a latent space.
CSR can track objects as the agent moves in a scene, update the representation accordingly, and detect changes in room configurations.
arXiv Detail & Related papers (2022-03-31T17:55:33Z) - Evaluating Continual Learning Algorithms by Generating 3D Virtual
Environments [66.83839051693695]
Continual learning refers to the ability of humans and animals to incrementally learn over time in a given environment.
We propose to leverage recent advances in 3D virtual environments in order to approach the automatic generation of potentially life-long dynamic scenes with photo-realistic appearance.
A novel element of this paper is that scenes are described in a parametric way, thus allowing the user to fully control the visual complexity of the input stream the agent perceives.
arXiv Detail & Related papers (2021-09-16T10:37:21Z) - Spatio-Temporal Graph for Video Captioning with Knowledge Distillation [50.034189314258356]
We propose a graph model for video captioning that exploits object interactions in space and time.
Our model builds interpretable links and is able to provide explicit visual grounding.
To avoid correlations caused by the variable number of objects, we propose an object-aware knowledge distillation mechanism.
arXiv Detail & Related papers (2020-03-31T03:58:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.