SceneGen: Generative Contextual Scene Augmentation using Scene Graph
Priors
- URL: http://arxiv.org/abs/2009.12395v2
- Date: Wed, 30 Sep 2020 17:06:05 GMT
- Title: SceneGen: Generative Contextual Scene Augmentation using Scene Graph
Priors
- Authors: Mohammad Keshavarzi, Aakash Parikh, Xiyu Zhai, Melody Mao, Luisa
Caldas, Allen Y. Yang
- Abstract summary: We introduce SceneGen, a generative contextual augmentation framework that predicts virtual object positions and orientations within existing scenes.
SceneGen takes a semantically segmented scene as input, and outputs positional and orientational probability maps for placing virtual content.
We formulate a novel spatial Scene Graph representation, which encapsulates explicit topological properties between objects, object groups, and the room.
To demonstrate our system in action, we develop an Augmented Reality application, in which objects can be contextually augmented in real-time.
- Score: 3.1969855247377827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spatial computing experiences are constrained by the real-world surroundings
of the user. In such experiences, augmenting virtual objects to existing scenes
require a contextual approach, where geometrical conflicts are avoided, and
functional and plausible relationships to other objects are maintained in the
target environment. Yet, due to the complexity and diversity of user
environments, automatically calculating ideal positions of virtual content that
is adaptive to the context of the scene is considered a challenging task.
Motivated by this problem, in this paper we introduce SceneGen, a generative
contextual augmentation framework that predicts virtual object positions and
orientations within existing scenes. SceneGen takes a semantically segmented
scene as input, and outputs positional and orientational probability maps for
placing virtual content. We formulate a novel spatial Scene Graph
representation, which encapsulates explicit topological properties between
objects, object groups, and the room. We believe providing explicit and
intuitive features plays an important role in informative content creation and
user interaction of spatial computing settings, a quality that is not captured
in implicit models. We use kernel density estimation (KDE) to build a
multivariate conditional knowledge model trained using prior spatial Scene
Graphs extracted from real-world 3D scanned data. To further capture
orientational properties, we develop a fast pose annotation tool to extend
current real-world datasets with orientational labels. Finally, to demonstrate
our system in action, we develop an Augmented Reality application, in which
objects can be contextually augmented in real-time.
Related papers
- Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments [44.6372390798904]
We propose a new task denominated Personalized Instance-based Navigation (PIN), in which an embodied agent is tasked with locating and reaching a specific personal object.
In each episode, the target object is presented to the agent using two modalities: a set of visual reference images on a neutral background and manually annotated textual descriptions.
arXiv Detail & Related papers (2024-10-23T18:01:09Z) - SpaceBlender: Creating Context-Rich Collaborative Spaces Through Generative 3D Scene Blending [19.06858242647237]
We introduce SpaceBlender, a pipeline that transforms user-provided 2D images into context-rich 3D environments.
Participants appreciated the enhanced familiarity and context provided by SpaceBlender but noted complexities in the generative environments.
We propose directions for improving the pipeline and discuss the value and design of blended spaces for different scenarios.
arXiv Detail & Related papers (2024-09-20T22:27:31Z) - Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning.
voxelization infers per-object occupancy probabilities at individual spatial locations.
Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z) - Context-Aware Indoor Point Cloud Object Generation through User Instructions [6.398660996031915]
We present a novel end-to-end multi-modal deep neural network capable of generating point cloud objects seamlessly integrated with their surroundings.
Our model revolutionizes scene modification by enabling the creation of new environments with previously unseen object layouts.
arXiv Detail & Related papers (2023-11-26T06:40:16Z) - AnyDoor: Zero-shot Object-level Image Customization [63.44307304097742]
This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations.
Our model is trained only once and effortlessly generalizes to diverse object-scene combinations at the inference stage.
arXiv Detail & Related papers (2023-07-18T17:59:02Z) - CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph
Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes.
Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes.
The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z) - A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented,
Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles.
Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z) - Continuous Scene Representations for Embodied AI [33.00565252990522]
Continuous Scene Representations (CSR) is a scene representation constructed by an embodied agent navigating within a space.
Our key insight is to embed pair-wise relationships between objects in a latent space.
CSR can track objects as the agent moves in a scene, update the representation accordingly, and detect changes in room configurations.
arXiv Detail & Related papers (2022-03-31T17:55:33Z) - Evaluating Continual Learning Algorithms by Generating 3D Virtual
Environments [66.83839051693695]
Continual learning refers to the ability of humans and animals to incrementally learn over time in a given environment.
We propose to leverage recent advances in 3D virtual environments in order to approach the automatic generation of potentially life-long dynamic scenes with photo-realistic appearance.
A novel element of this paper is that scenes are described in a parametric way, thus allowing the user to fully control the visual complexity of the input stream the agent perceives.
arXiv Detail & Related papers (2021-09-16T10:37:21Z) - Spatio-Temporal Graph for Video Captioning with Knowledge Distillation [50.034189314258356]
We propose a graph model for video captioning that exploits object interactions in space and time.
Our model builds interpretable links and is able to provide explicit visual grounding.
To avoid correlations caused by the variable number of objects, we propose an object-aware knowledge distillation mechanism.
arXiv Detail & Related papers (2020-03-31T03:58:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.