LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
- URL: http://arxiv.org/abs/2407.03168v2
- Date: Fri, 28 Feb 2025 14:39:17 GMT
- Title: LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
- Authors: Jianzhu Guo, Dingyun Zhang, Xiaoqiang Liu, Zhizhou Zhong, Yuan Zhang, Pengfei Wan, Di Zhang,
- Abstract summary: Portrait Animation aims to synthesize a video from a single source image, using it as an appearance reference, with motion derived from a driving video, audio, text, or generation.<n>We develop a video-driven portrait animation framework named LivePortrait with a focus on better generalization, controllability, and efficiency for practical usage.
- Score: 13.552097853323207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Portrait Animation aims to synthesize a lifelike video from a single source image, using it as an appearance reference, with motion (i.e., facial expressions and head pose) derived from a driving video, audio, text, or generation. Instead of following mainstream diffusion-based methods, we explore and extend the potential of the implicit-keypoint-based framework, which effectively balances computational efficiency and controllability. Building upon this, we develop a video-driven portrait animation framework named LivePortrait with a focus on better generalization, controllability, and efficiency for practical usage. To enhance the generation quality and generalization ability, we scale up the training data to about 69 million high-quality frames, adopt a mixed image-video training strategy, upgrade the network architecture, and design better motion transformation and optimization objectives. Additionally, we discover that compact implicit keypoints can effectively represent a kind of blendshapes and meticulously propose a stitching and two retargeting modules, which utilize a small MLP with negligible computational overhead, to enhance the controllability. Experimental results demonstrate the efficacy of our framework even compared to diffusion-based methods. The generation speed remarkably reaches 12.8ms on an RTX 4090 GPU with PyTorch. The inference code and models are available at https://github.com/KwaiVGI/LivePortrait
Related papers
- UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer [45.51168344933782]
UniAnimate-DiT is an advanced project that leverages the cutting-edge and powerful capabilities of the open-source Wan2.1 model for consistent human image animation.
We implement Low-Rank Adaptation (LoRA) technique to fine-tune a minimal set of parameters, significantly reducing training memory overhead.
Experimental results show that our approach achieves visually appearing and temporally consistent high-fidelity animations.
arXiv Detail & Related papers (2025-04-15T15:29:11Z) - HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation [30.030540407121325]
HunyuanPortrait is a diffusion-based condition control method for portrait animation.
It can animate the character in the reference image by the facial expression and head pose of the driving videos.
Our framework outperforms existing methods, demonstrating superior temporal consistency and controllability.
arXiv Detail & Related papers (2025-03-24T16:35:41Z) - Joint Learning of Depth and Appearance for Portrait Image Animation [20.83495988491606]
We propose to jointly learn the visual appearance and depth simultaneously in a diffusion-based portrait image generator.
Our framework can be efficiently adapted to various downstream applications, such as facial depth-to-image and image-to-depth generation.
arXiv Detail & Related papers (2025-01-15T08:24:35Z) - Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics [67.97235923372035]
We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics.
At test time, given a single image and a sparse set of motion trajectories, Puppet-Master can synthesize a video depicting realistic part-level motion faithful to the given drag interactions.
arXiv Detail & Related papers (2024-08-08T17:59:38Z) - WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models [132.77237314239025]
Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos.
Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions.
We reconceptualize video try-on as a process of generating videos conditioned on garment descriptions and human motion.
Our solution, WildVidFit, employs image-based controlled diffusion models for a streamlined, one-stage approach.
arXiv Detail & Related papers (2024-07-15T11:21:03Z) - Rethinking Image-to-Video Adaptation: An Object-centric Perspective [61.833533295978484]
We propose a novel and efficient image-to-video adaptation strategy from the object-centric perspective.
Inspired by human perception, we integrate a proxy task of object discovery into image-to-video transfer learning.
arXiv Detail & Related papers (2024-07-09T13:58:10Z) - UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation [53.16986875759286]
We present a UniAnimate framework to enable efficient and long-term human video generation.
We map the reference image along with the posture guidance and noise video into a common feature space.
We also propose a unified noise input that supports random noised input as well as first frame conditioned input.
arXiv Detail & Related papers (2024-06-03T10:51:10Z) - X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention [18.211762995744337]
We propose X-Portrait, an innovative conditional diffusion model tailored for generating expressive and temporally coherent portrait animation.
Given a single portrait as appearance reference, we aim to animate it with motion derived from a driving video, capturing both highly dynamic and subtle facial expressions.
Experimental results demonstrate the universal effectiveness of X-Portrait across a diverse range of facial portraits and expressive driving sequences.
arXiv Detail & Related papers (2024-03-23T20:30:28Z) - Continuous Piecewise-Affine Based Motion Model for Image Animation [45.55812811136834]
Image animation aims to bring static images to life according to driving videos.
Recent unsupervised methods utilize affine and thin-plate spline transformations based on keypoints to transfer the motion in driving frames to the source image.
We propose to model motion from the source image to the driving frame in highly-expressive diffeo spaces.
arXiv Detail & Related papers (2024-01-17T11:40:05Z) - MagicAnimate: Temporally Consistent Human Image Animation using
Diffusion Model [74.84435399451573]
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence.
Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion.
We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
arXiv Detail & Related papers (2023-11-27T18:32:31Z) - LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation [44.220329202024494]
We present a few-shot-based tuning framework, LAMP, which enables text-to-image diffusion model Learn A specific Motion Pattern with 816 videos on a single GPU.
Specifically, we design a first-frame-conditioned pipeline that uses an off-the-shelf text-to-image model for content generation.
To capture the features of temporal dimension, we expand the pretrained 2D convolution layers of the T2I model to our novel temporal-spatial motion learning layers.
arXiv Detail & Related papers (2023-10-16T19:03:19Z) - Deep Spatial Transformation for Pose-Guided Person Image Generation and
Animation [50.10989443332995]
Pose-guided person image generation and animation aim to transform a source person image to target poses.
Convolutional Neural Networks are limited by the lack of ability to spatially transform the inputs.
We propose a differentiable global-flow local-attention framework to reassemble the inputs at the feature level.
arXiv Detail & Related papers (2020-08-27T08:59:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.