Motion Transformer for Unsupervised Image Animation
- URL: http://arxiv.org/abs/2209.14024v1
- Date: Wed, 28 Sep 2022 12:04:58 GMT
- Title: Motion Transformer for Unsupervised Image Animation
- Authors: Jiale Tao, Biao Wang, Tiezheng Ge, Yuning Jiang, Wen Li, and Lixin
Duan
- Abstract summary: Image animation aims to animate a source image by using motion learned from a driving video.
Current state-of-the-art methods typically use convolutional neural networks (CNNs) to predict motion information.
We propose a new method, the motion transformer, which is the first attempt to build a motion estimator based on a vision transformer.
- Score: 37.35527776043379
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image animation aims to animate a source image by using motion learned from a
driving video. Current state-of-the-art methods typically use convolutional
neural networks (CNNs) to predict motion information, such as motion keypoints
and corresponding local transformations. However, these CNN based methods do
not explicitly model the interactions between motions; as a result, the
important underlying motion relationship may be neglected, which can
potentially lead to noticeable artifacts being produced in the generated
animation video. To this end, we propose a new method, the motion transformer,
which is the first attempt to build a motion estimator based on a vision
transformer. More specifically, we introduce two types of tokens in our
proposed method: i) image tokens formed from patch features and corresponding
position encoding; and ii) motion tokens encoded with motion information. Both
types of tokens are sent into vision transformers to promote underlying
interactions between them through multi-head self attention blocks. By adopting
this process, the motion information can be better learned to boost the model
performance. The final embedded motion tokens are then used to predict the
corresponding motion keypoints and local transformations. Extensive experiments
on benchmark datasets show that our proposed method achieves promising results
to the state-of-the-art baselines. Our source code will be public available.
Related papers
- Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics [67.97235923372035]
We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics.
At test time, given a single image and a sparse set of motion trajectories, Puppet-Master can synthesize a video depicting realistic part-level motion faithful to the given drag interactions.
arXiv Detail & Related papers (2024-08-08T17:59:38Z) - Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion [9.134743677331517]
We propose a pre-trained image-to-video model to disentangle appearance from motion.
Our method, called motion-textual inversion, leverages our observation that image-to-video models extract appearance mainly from the (latent) image input.
By operating on an inflated motion-text embedding containing multiple text/image embedding tokens per frame, we achieve a high temporal motion granularity.
Our approach does not require spatial alignment between the motion reference video and target image, generalizes across various domains, and can be applied to various tasks.
arXiv Detail & Related papers (2024-08-01T10:55:20Z) - Continuous Piecewise-Affine Based Motion Model for Image Animation [45.55812811136834]
Image animation aims to bring static images to life according to driving videos.
Recent unsupervised methods utilize affine and thin-plate spline transformations based on keypoints to transfer the motion in driving frames to the source image.
We propose to model motion from the source image to the driving frame in highly-expressive diffeo spaces.
arXiv Detail & Related papers (2024-01-17T11:40:05Z) - Human MotionFormer: Transferring Human Motions with Vision Transformers [73.48118882676276]
Human motion transfer aims to transfer motions from a target dynamic person to a source static one for motion synthesis.
We propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching.
Experiments show that our Human MotionFormer sets the new state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-02-22T11:42:44Z) - Sparse to Dense Motion Transfer for Face Image Animation [34.16015389505612]
Given a source face image and a sequence of sparse face landmarks, our goal is to generate a video of the face imitating the motion of landmarks.
We develop an efficient and effective method for motion transfer from sparse landmarks to the face image.
arXiv Detail & Related papers (2021-09-01T16:23:57Z) - Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers [77.52828273633646]
We present a new drop-in block for video transformers that aggregates information along implicitly determined motion paths.
We also propose a new method to address the quadratic dependence of computation and memory on the input size.
We obtain state-of-the-art results on the Kinetics, Something--Something V2, and Epic-Kitchens datasets.
arXiv Detail & Related papers (2021-06-09T21:16:05Z) - Animating Pictures with Eulerian Motion Fields [90.30598913855216]
We show a fully automatic method for converting a still image into a realistic animated looping video.
We target scenes with continuous fluid motion, such as flowing water and billowing smoke.
We propose a novel video looping technique that flows features both forward and backward in time and then blends the results.
arXiv Detail & Related papers (2020-11-30T18:59:06Z) - First Order Motion Model for Image Animation [90.712718329677]
Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video.
Our framework addresses this problem without using any annotation or prior information about the specific object to animate.
arXiv Detail & Related papers (2020-02-29T07:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.