iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis
- URL: http://arxiv.org/abs/2107.02790v1
- Date: Tue, 6 Jul 2021 17:57:55 GMT
- Title: iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis
- Authors: Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Bj\"orn Ommer
- Abstract summary: iPOKE - invertible Prediction of Object Kinematics - allows to sample object kinematics and establish one-to-one correspondence to the corresponding plausible videos.
In contrast to previous works, we do not generate arbitrary realistic videos, but provide efficient control of movements.
Our approach can transfer kinematics onto novel object instances and is not confined to particular object classes.
- Score: 8.17925295907622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: How would a static scene react to a local poke? What are the effects on other
parts of an object if you could locally push it? There will be distinctive
movement, despite evident variations caused by the stochastic nature of our
world. These outcomes are governed by the characteristic kinematics of objects
that dictate their overall motion caused by a local interaction. Conversely,
the movement of an object provides crucial information about its underlying
distinctive kinematics and the interdependencies between its parts. This
two-way relation motivates learning a bijective mapping between object
kinematics and plausible future image sequences. Therefore, we propose iPOKE -
invertible Prediction of Object Kinematics - that, conditioned on an initial
frame and a local poke, allows to sample object kinematics and establishes a
one-to-one correspondence to the corresponding plausible videos, thereby
providing a controlled stochastic video synthesis. In contrast to previous
works, we do not generate arbitrary realistic videos, but provide efficient
control of movements, while still capturing the stochastic nature of our
environment and the diversity of plausible outcomes it entails. Moreover, our
approach can transfer kinematics onto novel object instances and is not
confined to particular object classes. Project page is available at
https://bit.ly/3dJN4Lf
Related papers
- Physics-based Scene Layout Generation from Human Motion [21.939444709132395]
We present a physics-based approach that simultaneously optimize a scene layout generator and simulates a moving human in a physics simulator.
We use reinforcement learning to perform a dual-optimization of both the character motion imitation controller and the scene layout generator.
We evaluate our method using motions from SAMP and PROX, and demonstrate physically plausible scene layout reconstruction compared with the previous kinematics-based method.
arXiv Detail & Related papers (2024-05-21T02:36:37Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - Unsupervised Multi-object Segmentation by Predicting Probable Motion
Patterns [92.80981308407098]
We propose a new approach to learn to segment multiple image objects without manual supervision.
The method can extract objects form still images, but uses videos for supervision.
We show state-of-the-art unsupervised object segmentation performance on simulated and real-world benchmarks.
arXiv Detail & Related papers (2022-10-21T17:57:05Z) - Learning Object Manipulation Skills from Video via Approximate
Differentiable Physics [27.923004421974156]
We teach robots to perform simple object manipulation tasks by watching a single video demonstration.
A differentiable scene ensures perceptual fidelity between the 3D scene and the 2D video.
We evaluate our approach on a 3D reconstruction task that consists of 54 video demonstrations.
arXiv Detail & Related papers (2022-08-03T10:21:47Z) - Stochastic Video Prediction with Structure and Motion [14.424465835834042]
We propose to factorize video observations into static and dynamic components.
By learning separate distributions of changes in foreground and background, we can decompose the scene into static and dynamic parts.
Our experiments demonstrate that disentangling structure and motion helps video prediction, leading to better future predictions in complex driving scenarios.
arXiv Detail & Related papers (2022-03-20T11:29:46Z) - Understanding Object Dynamics for Interactive Image-to-Video Synthesis [8.17925295907622]
We present an approach that learns naturally-looking global articulations caused by a local manipulation at a pixel level.
Our generative model learns to infer natural object dynamics as a response to user interaction.
In contrast to existing work on video prediction, we do not synthesize arbitrary realistic videos.
arXiv Detail & Related papers (2021-06-21T17:57:39Z) - JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion
Retargeting [53.28477676794658]
unsupervised motion in videos has seen substantial advancements through the use of deep neural networks.
We introduce JOKR - a JOint Keypoint Representation that handles both the source and target videos, without requiring any object prior or data collection.
We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans.
arXiv Detail & Related papers (2021-06-17T17:32:32Z) - Object Properties Inferring from and Transfer for Human Interaction
Motions [51.896592493436984]
In this paper, we present a fine-grained action recognition method that learns to infer object properties from human interaction motion alone.
We collect a large number of videos and 3D skeletal motions of the performing actors using an inertial motion capture device.
In particular, we learn to identify the interacting object, by estimating its weight, or its fragility or delicacy.
arXiv Detail & Related papers (2020-08-20T14:36:34Z) - RELATE: Physically Plausible Multi-Object Scene Synthesis Using
Structured Latent Spaces [77.07767833443256]
We present RELATE, a model that learns to generate physically plausible scenes and videos of multiple interacting objects.
In contrast to state-of-the-art methods in object-centric generative modeling, RELATE also extends naturally to dynamic scenes and generates videos of high visual fidelity.
arXiv Detail & Related papers (2020-07-02T17:27:27Z) - Occlusion resistant learning of intuitive physics from videos [52.25308231683798]
Key ability for artificial systems is to understand physical interactions between objects, and predict future outcomes of a situation.
This ability, often referred to as intuitive physics, has recently received attention and several methods were proposed to learn these physical rules from video sequences.
arXiv Detail & Related papers (2020-04-30T19:35:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.