Motion Transformer for Unsupervised Image Animation
- URL: http://arxiv.org/abs/2209.14024v1
- Date: Wed, 28 Sep 2022 12:04:58 GMT
- Title: Motion Transformer for Unsupervised Image Animation
- Authors: Jiale Tao, Biao Wang, Tiezheng Ge, Yuning Jiang, Wen Li, and Lixin
Duan
- Abstract summary: Image animation aims to animate a source image by using motion learned from a driving video.
Current state-of-the-art methods typically use convolutional neural networks (CNNs) to predict motion information.
We propose a new method, the motion transformer, which is the first attempt to build a motion estimator based on a vision transformer.
- Score: 37.35527776043379
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image animation aims to animate a source image by using motion learned from a
driving video. Current state-of-the-art methods typically use convolutional
neural networks (CNNs) to predict motion information, such as motion keypoints
and corresponding local transformations. However, these CNN based methods do
not explicitly model the interactions between motions; as a result, the
important underlying motion relationship may be neglected, which can
potentially lead to noticeable artifacts being produced in the generated
animation video. To this end, we propose a new method, the motion transformer,
which is the first attempt to build a motion estimator based on a vision
transformer. More specifically, we introduce two types of tokens in our
proposed method: i) image tokens formed from patch features and corresponding
position encoding; and ii) motion tokens encoded with motion information. Both
types of tokens are sent into vision transformers to promote underlying
interactions between them through multi-head self attention blocks. By adopting
this process, the motion information can be better learned to boost the model
performance. The final embedded motion tokens are then used to predict the
corresponding motion keypoints and local transformations. Extensive experiments
on benchmark datasets show that our proposed method achieves promising results
to the state-of-the-art baselines. Our source code will be public available.
Related papers
- MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model [78.11258752076046]
MOFA-Video is an advanced controllable image animation method that generates video from the given image using various additional controllable signals.
We design several domain-aware motion field adapters to control the generated motions in the video generation pipeline.
After training, the MOFA-Adapters in different domains can also work together for more controllable video generation.
arXiv Detail & Related papers (2024-05-30T16:22:22Z) - Continuous Piecewise-Affine Based Motion Model for Image Animation [45.55812811136834]
Image animation aims to bring static images to life according to driving videos.
Recent unsupervised methods utilize affine and thin-plate spline transformations based on keypoints to transfer the motion in driving frames to the source image.
We propose to model motion from the source image to the driving frame in highly-expressive diffeo spaces.
arXiv Detail & Related papers (2024-01-17T11:40:05Z) - Human MotionFormer: Transferring Human Motions with Vision Transformers [73.48118882676276]
Human motion transfer aims to transfer motions from a target dynamic person to a source static one for motion synthesis.
We propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching.
Experiments show that our Human MotionFormer sets the new state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-02-22T11:42:44Z) - Image Animation with Keypoint Mask [0.0]
Motion transfer is the task of synthesizing future video frames of a single source image according to the motion from a given driving video.
In this work, we extract the structure from a keypoint heatmap, without an explicit motion representation.
Then, the structures from the image and the video are extracted to warp the image according to the video, by a deep generator.
arXiv Detail & Related papers (2021-12-20T11:35:06Z) - Sparse to Dense Motion Transfer for Face Image Animation [34.16015389505612]
Given a source face image and a sequence of sparse face landmarks, our goal is to generate a video of the face imitating the motion of landmarks.
We develop an efficient and effective method for motion transfer from sparse landmarks to the face image.
arXiv Detail & Related papers (2021-09-01T16:23:57Z) - Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers [77.52828273633646]
We present a new drop-in block for video transformers that aggregates information along implicitly determined motion paths.
We also propose a new method to address the quadratic dependence of computation and memory on the input size.
We obtain state-of-the-art results on the Kinetics, Something--Something V2, and Epic-Kitchens datasets.
arXiv Detail & Related papers (2021-06-09T21:16:05Z) - Animating Pictures with Eulerian Motion Fields [90.30598913855216]
We show a fully automatic method for converting a still image into a realistic animated looping video.
We target scenes with continuous fluid motion, such as flowing water and billowing smoke.
We propose a novel video looping technique that flows features both forward and backward in time and then blends the results.
arXiv Detail & Related papers (2020-11-30T18:59:06Z) - First Order Motion Model for Image Animation [90.712718329677]
Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video.
Our framework addresses this problem without using any annotation or prior information about the specific object to animate.
arXiv Detail & Related papers (2020-02-29T07:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.