Understanding Object Dynamics for Interactive Image-to-Video Synthesis
- URL: http://arxiv.org/abs/2106.11303v1
- Date: Mon, 21 Jun 2021 17:57:39 GMT
- Title: Understanding Object Dynamics for Interactive Image-to-Video Synthesis
- Authors: Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Bj\"orn Ommer
- Abstract summary: We present an approach that learns naturally-looking global articulations caused by a local manipulation at a pixel level.
Our generative model learns to infer natural object dynamics as a response to user interaction.
In contrast to existing work on video prediction, we do not synthesize arbitrary realistic videos.
- Score: 8.17925295907622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: What would be the effect of locally poking a static scene? We present an
approach that learns naturally-looking global articulations caused by a local
manipulation at a pixel level. Training requires only videos of moving objects
but no information of the underlying manipulation of the physical scene. Our
generative model learns to infer natural object dynamics as a response to user
interaction and learns about the interrelations between different object body
regions. Given a static image of an object and a local poking of a pixel, the
approach then predicts how the object would deform over time. In contrast to
existing work on video prediction, we do not synthesize arbitrary realistic
videos but enable local interactive control of the deformation. Our model is
not restricted to particular object categories and can transfer dynamics onto
novel unseen object instances. Extensive experiments on diverse objects
demonstrate the effectiveness of our approach compared to common video
prediction frameworks. Project page is available at https://bit.ly/3cxfA2L .
Related papers
- Generating Human Interaction Motions in Scenes with Text Control [66.74298145999909]
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models.
Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model.
To facilitate training, we embed annotated navigation and interaction motions within scenes.
arXiv Detail & Related papers (2024-04-16T16:04:38Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - AnyDoor: Zero-shot Object-level Image Customization [63.44307304097742]
This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations.
Our model is trained only once and effortlessly generalizes to diverse object-scene combinations at the inference stage.
arXiv Detail & Related papers (2023-07-18T17:59:02Z) - Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object
Video Generation [26.292052071093945]
We propose an unsupervised method to generate videos from a single frame and a sparse motion input.
Our trained model can generate unseen realistic object-to-object interactions.
We show that YODA is on par with or better than state of the art video generation prior work in terms of both controllability and video quality.
arXiv Detail & Related papers (2023-06-06T19:50:02Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis [8.17925295907622]
iPOKE - invertible Prediction of Object Kinematics - allows to sample object kinematics and establish one-to-one correspondence to the corresponding plausible videos.
In contrast to previous works, we do not generate arbitrary realistic videos, but provide efficient control of movements.
Our approach can transfer kinematics onto novel object instances and is not confined to particular object classes.
arXiv Detail & Related papers (2021-07-06T17:57:55Z) - DyStaB: Unsupervised Object Segmentation via Dynamic-Static
Bootstrapping [72.84991726271024]
We describe an unsupervised method to detect and segment portions of images of live scenes that are seen moving as a coherent whole.
Our method first partitions the motion field by minimizing the mutual information between segments.
It uses the segments to learn object models that can be used for detection in a static image.
arXiv Detail & Related papers (2020-08-16T22:05:13Z) - RELATE: Physically Plausible Multi-Object Scene Synthesis Using
Structured Latent Spaces [77.07767833443256]
We present RELATE, a model that learns to generate physically plausible scenes and videos of multiple interacting objects.
In contrast to state-of-the-art methods in object-centric generative modeling, RELATE also extends naturally to dynamic scenes and generates videos of high visual fidelity.
arXiv Detail & Related papers (2020-07-02T17:27:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.