Related papers: ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

URL: http://arxiv.org/abs/2404.15275v3
Date: Tue, 25 Jun 2024 16:57:27 GMT
Title: ID-Animator: Zero-Shot Identity-Preserving Human Video Generation
Authors: Xuanhua He, Quande Liu, Shengju Qian, Xin Wang, Tao Hu, Ke Cao, Keyu Yan, Jie Zhang,
Abstract summary: ID-Animator is a zero-shot human-video generation approach that can perform personalized video generation given a single reference facial image without further training. Our method is highly compatible with popular pre-trained T2V models like animatediff and various community backbone models.
Score: 16.438935466843304
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generating high-fidelity human video with specified identities has attracted significant attention in the content generation community. However, existing techniques struggle to strike a balance between training efficiency and identity preservation, either requiring tedious case-by-case fine-tuning or usually missing identity details in the video generation process. In this study, we present \textbf{ID-Animator}, a zero-shot human-video generation approach that can perform personalized video generation given a single reference facial image without further training. ID-Animator inherits existing diffusion-based video generation backbones with a face adapter to encode the ID-relevant embeddings from learnable facial latent queries. To facilitate the extraction of identity information in video generation, we introduce an ID-oriented dataset construction pipeline that incorporates unified human attributes and action captioning techniques from a constructed facial image pool. Based on this pipeline, a random reference training strategy is further devised to precisely capture the ID-relevant embeddings with an ID-preserving loss, thus improving the fidelity and generalization capacity of our model for ID-specific video generation. Extensive experiments demonstrate the superiority of ID-Animator to generate personalized human videos over previous models. Moreover, our method is highly compatible with popular pre-trained T2V models like animatediff and various community backbone models, showing high extendability in real-world applications for video generation where identity preservation is highly desired. Our codes and checkpoints are released at https://github.com/ID-Animator/ID-Animator.

Related papers

StableAnimator++: Overcoming Pose Misalignment and Face Distortion for Human Image Animation [98.10527466949338]
Current diffusion models for human image animation often struggle to maintain identity consistency.<n>We introduce StableAnimator++, the first ID-preserving video diffusion framework with learnable pose alignment.<n>We show how StableAnimator++ generates high-quality videos conditioned on a reference image and a pose sequence without any post-processing.
arXiv Detail & Related papers (2025-07-20T17:59:26Z)
GenHSI: Controllable Generation of Human-Scene Interaction Videos [22.186091372007105]
GenHSI is a training-free method for controllable generation of long human-scene interaction videos.<n>Taking inspiration from movie animation, our key insight is to overcome the limitations of previous work by subdividing the long video generation task into three stages.<n>We are the first to generate a long video sequence with a consistent camera pose that contains arbitrary numbers of character actions without training.
arXiv Detail & Related papers (2025-06-24T17:58:04Z)
Concat-ID: Towards Universal Identity-Preserving Video Synthesis [23.40342294656802]
We present Concat-ID, a unified framework for identity-preserving video synthesis. Concat-ID employs Autoencoders to extract image features, which are latent with video sequence latents. A novel cross-video pairing strategy and a multi-stage training regimen are introduced to balance consistency and facial editability.
arXiv Detail & Related papers (2025-03-18T11:17:32Z)
StableAnimator: High-Quality Identity-Preserving Human Image Animation [64.63765800569935]
This paper presents StableAnimator, the first end-to-end ID-preserving video diffusion framework. It synthesizes high-quality videos without any post-processing, conditioned on a reference image and a sequence of poses. During inference, we propose a novel Hamilton-JacobiBellman (HJB) equation-based optimization to further enhance the face quality.
arXiv Detail & Related papers (2024-11-26T18:59:22Z)
PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation [36.21554597804604]
Identity-specific human video generation with customized ID images is still under-explored. We propose a novel framework, dubbed textbfPersonalVideo, that applies direct supervision on videos synthesized by the T2V model. Our method's superiority in delivering high identity faithfulness while preserving the inherent video generation qualities of the original T2V model, outshining prior approaches.
arXiv Detail & Related papers (2024-11-26T02:25:38Z)
VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation [79.99551055245071]
We propose VividPose, an end-to-end pipeline that ensures superior temporal stability. An identity-aware appearance controller integrates additional facial information without compromising other appearance details. A geometry-aware pose controller utilizes both dense rendering maps from SMPL-X and sparse skeleton maps. VividPose exhibits superior generalization capabilities on our proposed in-the-wild dataset.
arXiv Detail & Related papers (2024-05-28T13:18:32Z)
Magic-Me: Identity-Specific Video Customized Diffusion [72.05925155000165]
We propose a controllable subject identity controllable video generation framework, termed Video Custom Diffusion (VCD) With a specified identity defined by a few images, VCD reinforces the identity characteristics and injects frame-wise correlation for stable video outputs. We conducted extensive experiments to verify that VCD is able to generate stable videos with better ID over the baselines.
arXiv Detail & Related papers (2024-02-14T18:13:51Z)
StableIdentity: Inserting Anybody into Anywhere at First Sight [57.99693188913382]
We propose StableIdentity, which allows identity-consistent recontextualization with just one face image. We are the first to directly inject the identity learned from a single image into video/3D generation without finetuning.
arXiv Detail & Related papers (2024-01-29T09:06:15Z)
A Video Is Worth Three Views: Trigeminal Transformers for Video-based Person Re-identification [77.08204941207985]
Video-based person re-identification (Re-ID) aims to retrieve video sequences of the same person under non-overlapping cameras. We propose a novel framework named Trigeminal Transformers (TMT) for video-based person Re-ID.
arXiv Detail & Related papers (2021-04-05T02:50:16Z)
PoseTrackReID: Dataset Description [97.7241689753353]
Pose information is helpful to disentangle useful feature information from background or occlusion noise. With PoseTrackReID, we want to bridge the gap between person re-ID and multi-person pose tracking. This dataset provides a good benchmark for current state-of-the-art methods on multi-frame person re-ID.
arXiv Detail & Related papers (2020-11-12T07:44:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.