MetaPortrait: Identity-Preserving Talking Head Generation with Fast
Personalized Adaptation
- URL: http://arxiv.org/abs/2212.08062v3
- Date: Mon, 27 Mar 2023 02:16:13 GMT
- Title: MetaPortrait: Identity-Preserving Talking Head Generation with Fast
Personalized Adaptation
- Authors: Bowen Zhang, Chenyang Qi, Pan Zhang, Bo Zhang, HsiangTao Wu, Dong
Chen, Qifeng Chen, Yong Wang, Fang Wen
- Abstract summary: We propose an ID-preserving talking head generation framework.
We claim that dense landmarks are crucial to achieving accurate geometry-aware flow fields.
We adaptively fuse the source identity during synthesis, so that the network better preserves the key characteristics of the image portrait.
- Score: 57.060828009199646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we propose an ID-preserving talking head generation framework,
which advances previous methods in two aspects. First, as opposed to
interpolating from sparse flow, we claim that dense landmarks are crucial to
achieving accurate geometry-aware flow fields. Second, inspired by
face-swapping methods, we adaptively fuse the source identity during synthesis,
so that the network better preserves the key characteristics of the image
portrait. Although the proposed model surpasses prior generation fidelity on
established benchmarks, to further make the talking head generation qualified
for real usage, personalized fine-tuning is usually needed. However, this
process is rather computationally demanding that is unaffordable to standard
users. To solve this, we propose a fast adaptation model using a meta-learning
approach. The learned model can be adapted to a high-quality personalized model
as fast as 30 seconds. Last but not the least, a spatial-temporal enhancement
module is proposed to improve the fine details while ensuring temporal
coherency. Extensive experiments prove the significant superiority of our
approach over the state of the arts in both one-shot and personalized settings.
Related papers
- Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis [7.099258248662009]
Text-to-image (T2I) models have significantly advanced the development of artificial intelligence.
However, existing T2I-based methods often struggle to accurately reproduce the appearance of individuals from a reference image.
We leverage the pre-trained UNet from Stable Diffusion to incorporate the target face image directly into the generation process.
arXiv Detail & Related papers (2024-09-27T19:31:04Z) - RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network [48.95833484103569]
RealTalk is an audio-to-expression transformer and a high-fidelity expression-to-face framework.
In the first component, we consider both identity and intra-personal variation features related to speaking lip movements.
In the second component, we design a lightweight facial identity alignment (FIA) module.
This novel design allows us to generate fine details in real-time, without depending on sophisticated and inefficient feature alignment modules.
arXiv Detail & Related papers (2024-06-26T12:09:59Z) - ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning [57.91881829308395]
Identity-preserving text-to-image generation (ID-T2I) has received significant attention due to its wide range of application scenarios like AI portrait and advertising.
We present textbfID-Aligner, a general feedback learning framework to enhance ID-T2I performance.
arXiv Detail & Related papers (2024-04-23T18:41:56Z) - Latent Diffusion Models for Attribute-Preserving Image Anonymization [4.080920304681247]
This paper presents the first approach to image anonymization based on Latent Diffusion Models (LDMs)
We propose two LDMs for this purpose: CAFLaGE-Base exploits a combination of pre-trained ControlNets, and a new controlling mechanism designed to increase the distance between the real and anonymized images.
arXiv Detail & Related papers (2024-03-21T19:09:21Z) - InstantID: Zero-shot Identity-Preserving Generation in Seconds [21.04236321562671]
We introduce InstantID, a powerful diffusion model-based solution for ID embedding.
Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image.
Our work seamlessly integrates with popular pre-trained text-to-image diffusion models like SD1.5 and SDXL.
arXiv Detail & Related papers (2024-01-15T07:50:18Z) - FaceStudio: Put Your Face Everywhere in Seconds [23.381791316305332]
Identity-preserving image synthesis seeks to maintain a subject's identity while adding a personalized, stylistic touch.
Traditional methods, such as Textual Inversion and DreamBooth, have made strides in custom image creation.
Our research introduces a novel approach to identity-preserving synthesis, with a particular focus on human images.
arXiv Detail & Related papers (2023-12-05T11:02:45Z) - Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image
Models [59.094601993993535]
Text-to-image (T2I) personalization allows users to combine their own visual concepts in natural language prompts.
Most existing encoders are limited to a single-class domain, which hinders their ability to handle diverse concepts.
We propose a domain-agnostic method that does not require any specialized dataset or prior information about the personalized concepts.
arXiv Detail & Related papers (2023-07-13T17:46:42Z) - Designing an Encoder for Fast Personalization of Text-to-Image Models [57.62449900121022]
We propose an encoder-based domain-tuning approach for text-to-image personalization.
We employ two components: First, an encoder that takes as an input a single image of a target concept from a given domain.
Second, a set of regularized weight-offsets for the text-to-image model that learn how to effectively ingest additional concepts.
arXiv Detail & Related papers (2023-02-23T18:46:41Z) - Real-time Pose and Shape Reconstruction of Two Interacting Hands With a
Single Depth Camera [79.41374930171469]
We present a novel method for real-time pose and shape reconstruction of two strongly interacting hands.
Our approach combines an extensive list of favorable properties, namely it is marker-less.
We show state-of-the-art results in scenes that exceed the complexity level demonstrated by previous work.
arXiv Detail & Related papers (2021-06-15T11:39:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.