RELATE: Physically Plausible Multi-Object Scene Synthesis Using
Structured Latent Spaces
- URL: http://arxiv.org/abs/2007.01272v2
- Date: Mon, 9 Nov 2020 18:03:58 GMT
- Title: RELATE: Physically Plausible Multi-Object Scene Synthesis Using
Structured Latent Spaces
- Authors: Sebastien Ehrhardt and Oliver Groth and Aron Monszpart and Martin
Engelcke and Ingmar Posner and Niloy Mitra and Andrea Vedaldi
- Abstract summary: We present RELATE, a model that learns to generate physically plausible scenes and videos of multiple interacting objects.
In contrast to state-of-the-art methods in object-centric generative modeling, RELATE also extends naturally to dynamic scenes and generates videos of high visual fidelity.
- Score: 77.07767833443256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present RELATE, a model that learns to generate physically plausible
scenes and videos of multiple interacting objects. Similar to other generative
approaches, RELATE is trained end-to-end on raw, unlabeled data. RELATE
combines an object-centric GAN formulation with a model that explicitly
accounts for correlations between individual objects. This allows the model to
generate realistic scenes and videos from a physically-interpretable
parameterization. Furthermore, we show that modeling the object correlation is
necessary to learn to disentangle object positions and identity. We find that
RELATE is also amenable to physically realistic scene editing and that it
significantly outperforms prior art in object-centric scene generation in both
synthetic (CLEVR, ShapeStacks) and real-world data (cars). In addition, in
contrast to state-of-the-art methods in object-centric generative modeling,
RELATE also extends naturally to dynamic scenes and generates videos of high
visual fidelity. Source code, datasets and more results are available at
http://geometry.cs.ucl.ac.uk/projects/2020/relate/.
Related papers
- CAGE: Controllable Articulation GEneration [14.002289666443529]
We leverage the interplay between part shape, connectivity, and motion using a denoising diffusion-based method.
Our method takes an object category label and a part connectivity graph as input and generates an object's geometry and motion parameters.
Our experiments show that our method outperforms the state-of-the-art in articulated object generation.
arXiv Detail & Related papers (2023-12-15T07:04:27Z) - GAMMA: Generalizable Articulation Modeling and Manipulation for
Articulated Objects [53.965581080954905]
We propose a novel framework of Generalizable Articulation Modeling and Manipulating for Articulated Objects (GAMMA)
GAMMA learns both articulation modeling and grasp pose affordance from diverse articulated objects with different categories.
Results show that GAMMA significantly outperforms SOTA articulation modeling and manipulation algorithms in unseen and cross-category articulated objects.
arXiv Detail & Related papers (2023-09-28T08:57:14Z) - STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning [4.676784872259775]
We propose a large-scale video dataset for understanding spatial relationships derived from prepositions of the English language.
The dataset contains 150K visual depictions (videos and images), consisting of 30 distinct spatial prepositional senses.
In addition to spatial relations, we also propose 50K visual depictions across 10 temporal relations, consisting of videos depicting event/time-point interactions.
arXiv Detail & Related papers (2023-09-13T02:35:59Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - DisCoScene: Spatially Disentangled Generative Radiance Fields for
Controllable 3D-aware Scene Synthesis [90.32352050266104]
DisCoScene is a 3Daware generative model for high-quality and controllable scene synthesis.
It disentangles the whole scene into object-centric generative fields by learning on only 2D images with the global-local discrimination.
We demonstrate state-of-the-art performance on many scene datasets, including the challenging outdoor dataset.
arXiv Detail & Related papers (2022-12-22T18:59:59Z) - Towards 3D Scene Understanding by Referring Synthetic Models [65.74211112607315]
Methods typically alleviate on-extensive annotations on real scene scans.
We explore how synthetic models rely on real scene categories of synthetic features to a unified feature space.
Experiments show that our method achieves the average mAP of 46.08% on the ScanNet S3DIS dataset and 55.49% by learning datasets.
arXiv Detail & Related papers (2022-03-20T13:06:15Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Understanding Object Dynamics for Interactive Image-to-Video Synthesis [8.17925295907622]
We present an approach that learns naturally-looking global articulations caused by a local manipulation at a pixel level.
Our generative model learns to infer natural object dynamics as a response to user interaction.
In contrast to existing work on video prediction, we do not synthesize arbitrary realistic videos.
arXiv Detail & Related papers (2021-06-21T17:57:39Z) - ROOTS: Object-Centric Representation and Rendering of 3D Scenes [28.24758046060324]
A crucial ability of human intelligence is to build up models of individual 3D objects from partial scene observations.
Recent works achieve object-centric generation but without the ability to infer the representation.
We propose a probabilistic generative model for learning to build modular and compositional 3D object models.
arXiv Detail & Related papers (2020-06-11T00:42:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.