Related papers: Motion Transformer for Unsupervised Image Animation

Motion Transformer for Unsupervised Image Animation

URL: http://arxiv.org/abs/2209.14024v1
Date: Wed, 28 Sep 2022 12:04:58 GMT
Title: Motion Transformer for Unsupervised Image Animation
Authors: Jiale Tao, Biao Wang, Tiezheng Ge, Yuning Jiang, Wen Li, and Lixin Duan
Abstract summary: Image animation aims to animate a source image by using motion learned from a driving video. Current state-of-the-art methods typically use convolutional neural networks (CNNs) to predict motion information. We propose a new method, the motion transformer, which is the first attempt to build a motion estimator based on a vision transformer.
Score: 37.35527776043379
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image animation aims to animate a source image by using motion learned from a driving video. Current state-of-the-art methods typically use convolutional neural networks (CNNs) to predict motion information, such as motion keypoints and corresponding local transformations. However, these CNN based methods do not explicitly model the interactions between motions; as a result, the important underlying motion relationship may be neglected, which can potentially lead to noticeable artifacts being produced in the generated animation video. To this end, we propose a new method, the motion transformer, which is the first attempt to build a motion estimator based on a vision transformer. More specifically, we introduce two types of tokens in our proposed method: i) image tokens formed from patch features and corresponding position encoding; and ii) motion tokens encoded with motion information. Both types of tokens are sent into vision transformers to promote underlying interactions between them through multi-head self attention blocks. By adopting this process, the motion information can be better learned to boost the model performance. The final embedded motion tokens are then used to predict the corresponding motion keypoints and local transformations. Extensive experiments on benchmark datasets show that our proposed method achieves promising results to the state-of-the-art baselines. Our source code will be public available.

Related papers

Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics [67.97235923372035]
We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics. At test time, given a single image and a sparse set of motion trajectories, Puppet-Master can synthesize a video depicting realistic part-level motion faithful to the given drag interactions.
arXiv Detail & Related papers (2024-08-08T17:59:38Z)
Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion [9.134743677331517]
We propose a pre-trained image-to-video model to disentangle appearance from motion. Our method, called motion-textual inversion, leverages our observation that image-to-video models extract appearance mainly from the (latent) image input. By operating on an inflated motion-text embedding containing multiple text/image embedding tokens per frame, we achieve a high temporal motion granularity. Our approach does not require spatial alignment between the motion reference video and target image, generalizes across various domains, and can be applied to various tasks.
arXiv Detail & Related papers (2024-08-01T10:55:20Z)
Continuous Piecewise-Affine Based Motion Model for Image Animation [45.55812811136834]
Image animation aims to bring static images to life according to driving videos. Recent unsupervised methods utilize affine and thin-plate spline transformations based on keypoints to transfer the motion in driving frames to the source image. We propose to model motion from the source image to the driving frame in highly-expressive diffeo spaces.
arXiv Detail & Related papers (2024-01-17T11:40:05Z)
Human MotionFormer: Transferring Human Motions with Vision Transformers [73.48118882676276]
Human motion transfer aims to transfer motions from a target dynamic person to a source static one for motion synthesis. We propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching. Experiments show that our Human MotionFormer sets the new state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-02-22T11:42:44Z)
Sparse to Dense Motion Transfer for Face Image Animation [34.16015389505612]
Given a source face image and a sequence of sparse face landmarks, our goal is to generate a video of the face imitating the motion of landmarks. We develop an efficient and effective method for motion transfer from sparse landmarks to the face image.
arXiv Detail & Related papers (2021-09-01T16:23:57Z)
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers [77.52828273633646]
We present a new drop-in block for video transformers that aggregates information along implicitly determined motion paths. We also propose a new method to address the quadratic dependence of computation and memory on the input size. We obtain state-of-the-art results on the Kinetics, Something--Something V2, and Epic-Kitchens datasets.
arXiv Detail & Related papers (2021-06-09T21:16:05Z)
Animating Pictures with Eulerian Motion Fields [90.30598913855216]
We show a fully automatic method for converting a still image into a realistic animated looping video. We target scenes with continuous fluid motion, such as flowing water and billowing smoke. We propose a novel video looping technique that flows features both forward and backward in time and then blends the results.
arXiv Detail & Related papers (2020-11-30T18:59:06Z)
First Order Motion Model for Image Animation [90.712718329677]
Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video. Our framework addresses this problem without using any annotation or prior information about the specific object to animate.
arXiv Detail & Related papers (2020-02-29T07:08:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.