Related papers: Learning Motion Refinement for Unsupervised Face Animation

Learning Motion Refinement for Unsupervised Face Animation

URL: http://arxiv.org/abs/2310.13912v1
Date: Sat, 21 Oct 2023 05:52:25 GMT
Title: Learning Motion Refinement for Unsupervised Face Animation
Authors: Jiale Tao, Shuhang Gu, Wen Li, Lixin Duan
Abstract summary: Unsupervised face animation aims to generate a human face video based on the appearance of a source image, mimicking the motion from a driving video. Existing methods typically adopted a prior-based motion model (e.g., the local affine motion model or the local thin-plate-spline motion model) In this work, we design a new unsupervised face animation approach to learn simultaneously the coarse and finer motions.
Score: 45.807582064277305
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Unsupervised face animation aims to generate a human face video based on the appearance of a source image, mimicking the motion from a driving video. Existing methods typically adopted a prior-based motion model (e.g., the local affine motion model or the local thin-plate-spline motion model). While it is able to capture the coarse facial motion, artifacts can often be observed around the tiny motion in local areas (e.g., lips and eyes), due to the limited ability of these methods to model the finer facial motions. In this work, we design a new unsupervised face animation approach to learn simultaneously the coarse and finer motions. In particular, while exploiting the local affine motion model to learn the global coarse facial motion, we design a novel motion refinement module to compensate for the local affine motion model for modeling finer face motions in local areas. The motion refinement is learned from the dense correlation between the source and driving images. Specifically, we first construct a structure correlation volume based on the keypoint features of the source and driving images. Then, we train a model to generate the tiny facial motions iteratively from low to high resolution. The learned motion refinements are combined with the coarse motion to generate the new image. Extensive experiments on widely used benchmarks demonstrate that our method achieves the best results among state-of-the-art baselines.

Related papers

PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning [38.004463823796286]
We propose PRIMAL, an autoregressive diffusion model that is learned with a two-stage paradigm. In the pretraining stage, the model learns motion dynamics from a large number of sub-second motion segments. In the adaptation phase, we employ a ControlNet-like adaptor to fine-tune the motor control for semantic action generation and spatial target reaching.
arXiv Detail & Related papers (2025-03-21T21:27:57Z)
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models [71.9811050853964]
VideoJAM is a novel framework that instills an effective motion prior to video generators. VideoJAM achieves state-of-the-art performance in motion coherence. These findings emphasize that appearance and motion can be complementary and, when effectively integrated, enhance both the visual quality and the coherence of video generation.
arXiv Detail & Related papers (2025-02-04T17:07:10Z)
Deblur-Avatar: Animatable Avatars from Motion-Blurred Monocular Videos [64.10307207290039]
We introduce a novel framework for modeling high-fidelity, animatable 3D human avatars from motion-blurred monocular video inputs. By explicitly modeling human motion trajectories during exposure time, we jointly optimize the trajectories and 3D Gaussians to reconstruct sharp, high-quality human avatars.
arXiv Detail & Related papers (2025-01-23T02:31:57Z)
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models [59.10171699717122]
MoTrans is a customized motion transfer method enabling video generation of similar motion in new context. multimodal representations from recaptioned prompt and video frames promote the modeling of appearance. Our method effectively learns specific motion pattern from singular or multiple reference videos.
arXiv Detail & Related papers (2024-12-02T10:07:59Z)
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics [67.97235923372035]
We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics. At test time, given a single image and a sparse set of motion trajectories, Puppet-Master can synthesize a video depicting realistic part-level motion faithful to the given drag interactions.
arXiv Detail & Related papers (2024-08-08T17:59:38Z)
Monkey See, Monkey Do: Harnessing Self-attention in Motion Diffusion for Zero-shot Motion Transfer [55.109778609058154]
Existing diffusion-based motion editing methods overlook the profound potential of the prior embedded within the weights of pre-trained models. We uncover the roles and interactions of attention elements in capturing and representing motion patterns. We integrate these elements to transfer a leader motion to a follower one while maintaining the nuanced characteristics of the follower, resulting in zero-shot motion transfer.
arXiv Detail & Related papers (2024-06-10T17:47:14Z)
Controllable Longer Image Animation with Diffusion Models [12.565739255499594]
We introduce an open-domain controllable image animation method using motion priors with video diffusion models. Our method achieves precise control over the direction and speed of motion in the movable region by extracting the motion field information from videos. We propose an efficient long-duration video generation method based on noise reschedule specifically tailored for image animation tasks.
arXiv Detail & Related papers (2024-05-27T16:08:00Z)
SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering [45.51684124904457]
We propose a new 4D motion paradigm, SurMo, that models the temporal dynamics and human appearances in a unified framework. Surface-based motion encoding that models 4D human motions with an efficient compact surface-based triplane. Physical motion decoding that is designed to encourage physical motion learning. 4D appearance modeling that renders the motion triplanes into images by an efficient surface-conditioned decoding.
arXiv Detail & Related papers (2024-04-01T16:34:27Z)
X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention [18.211762995744337]
We propose X-Portrait, an innovative conditional diffusion model tailored for generating expressive and temporally coherent portrait animation. Given a single portrait as appearance reference, we aim to animate it with motion derived from a driving video, capturing both highly dynamic and subtle facial expressions. Experimental results demonstrate the universal effectiveness of X-Portrait across a diverse range of facial portraits and expressive driving sequences.
arXiv Detail & Related papers (2024-03-23T20:30:28Z)
MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method. MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model. During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z)
Sparse to Dense Motion Transfer for Face Image Animation [34.16015389505612]
Given a source face image and a sequence of sparse face landmarks, our goal is to generate a video of the face imitating the motion of landmarks. We develop an efficient and effective method for motion transfer from sparse landmarks to the face image.
arXiv Detail & Related papers (2021-09-01T16:23:57Z)
Real-time Deep Dynamic Characters [95.5592405831368]
We propose a deep videorealistic 3D human character model displaying highly realistic shape, motion, and dynamic appearance. We use a novel graph convolutional network architecture to enable motion-dependent deformation learning of body and clothing. We show that our model creates motion-dependent surface deformations, physically plausible dynamic clothing deformations, as well as video-realistic surface textures at a much higher level of detail than previous state of the art approaches.
arXiv Detail & Related papers (2021-05-04T23:28:55Z)
Self-supervised Motion Learning from Static Images [36.85209332144106]
Motion from Static Images (MoSI) learns to encode motion information. MoSI can discover regions with large motion even without fine-tuning on the downstream datasets. We demonstrate that MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.
arXiv Detail & Related papers (2021-04-01T03:55:50Z)
First Order Motion Model for Image Animation [90.712718329677]
Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video. Our framework addresses this problem without using any annotation or prior information about the specific object to animate.
arXiv Detail & Related papers (2020-02-29T07:08:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.