MyPortrait: Morphable Prior-Guided Personalized Portrait Generation
- URL: http://arxiv.org/abs/2312.02703v1
- Date: Tue, 5 Dec 2023 12:05:01 GMT
- Title: MyPortrait: Morphable Prior-Guided Personalized Portrait Generation
- Authors: Bo Ding, Zhenfeng Fan, Shuang Yang, Shihong Xia
- Abstract summary: Myportrait is a simple, general, and flexible framework for neural portrait generation.
Our proposed framework supports both video-driven and audio-driven face animation.
Our method provides a real-time online version and a high-quality offline version.
- Score: 19.911068375240905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating realistic talking faces is an interesting and long-standing topic
in the field of computer vision. Although significant progress has been made,
it is still challenging to generate high-quality dynamic faces with
personalized details. This is mainly due to the inability of the general model
to represent personalized details and the generalization problem to unseen
controllable parameters. In this work, we propose Myportrait, a simple,
general, and flexible framework for neural portrait generation. We incorporate
personalized prior in a monocular video and morphable prior in 3D face
morphable space for generating personalized details under novel controllable
parameters. Our proposed framework supports both video-driven and audio-driven
face animation given a monocular video of a single person. Distinguished by
whether the test data is sent to training or not, our method provides a
real-time online version and a high-quality offline version. Comprehensive
experiments in various metrics demonstrate the superior performance of our
method over the state-of-the-art methods. The code will be publicly available.
Related papers
- Controllable Talking Face Generation by Implicit Facial Keypoints Editing [6.036277153327655]
We present ControlTalk, a talking face generation method to control face expression deformation based on driven audio.
Our experiments show that our method is superior to state-of-the-art performance on widely used benchmarks, including HDTF and MEAD.
arXiv Detail & Related papers (2024-06-05T02:54:46Z) - VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis [40.869862603815875]
VLOGGER is a method for audio-driven human video generation from a single input image.
We use a novel diffusion-based architecture that augments text-to-image models with both spatial and temporal controls.
We show applications in video editing and personalization.
arXiv Detail & Related papers (2024-03-13T17:59:02Z) - GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained
3D Face Guidance [83.43852715997596]
GSmoothFace is a novel two-stage generalized talking face generation model guided by a fine-grained 3d face model.
It can synthesize smooth lip dynamics while preserving the speaker's identity.
Both quantitative and qualitative experiments confirm the superiority of our method in terms of realism, lip synchronization, and visual quality.
arXiv Detail & Related papers (2023-12-12T16:00:55Z) - ReliTalk: Relightable Talking Portrait Generation from a Single Video [62.47116237654984]
ReliTalk is a novel framework for relightable audio-driven talking portrait generation from monocular videos.
Our key insight is to decompose the portrait's reflectance from implicitly learned audio-driven facial normals and images.
arXiv Detail & Related papers (2023-09-05T17:59:42Z) - FaceChain: A Playground for Human-centric Artificial Intelligence
Generated Content [36.48960592782015]
FaceChain is a personalized portrait generation framework that combines a series of customized image-generation model and a rich set of face-related perceptual understanding models.
We inject several SOTA face models into the generation procedure, achieving a more efficient label-tagging, data-processing, and model post-processing compared to previous solutions.
Based on FaceChain, we further develop several applications to build a broader playground for better showing its value, including virtual try on and 2D talking head.
arXiv Detail & Related papers (2023-08-28T02:20:44Z) - PVP: Personalized Video Prior for Editable Dynamic Portraits using
StyleGAN [33.49053731211931]
StyleGAN has shown promising results in photorealistic and accurate reconstruction of human faces.
In this work, our goal is to take as input a monocular video of a face, and create an editable dynamic portrait.
The user can create novel viewpoints, edit the appearance, and animate the face.
arXiv Detail & Related papers (2023-06-29T17:26:51Z) - HQ3DAvatar: High Quality Controllable 3D Head Avatar [65.70885416855782]
This paper presents a novel approach to building highly photorealistic digital head avatars.
Our method learns a canonical space via an implicit function parameterized by a neural network.
At test time, our method is driven by a monocular RGB video.
arXiv Detail & Related papers (2023-03-25T13:56:33Z) - Multiface: A Dataset for Neural Face Rendering [108.44505415073579]
In this work, we present Multiface, a new multi-view, high-resolution human face dataset.
We introduce Mugsy, a large scale multi-camera apparatus to capture high-resolution synchronized videos of a facial performance.
The goal of Multiface is to close the gap in accessibility to high quality data in the academic community and to enable research in VR telepresence.
arXiv Detail & Related papers (2022-07-22T17:55:39Z) - StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via
Pretrained StyleGAN [49.917296433657484]
One-shot talking face generation aims at synthesizing a high-quality talking face video from an arbitrary portrait image.
In this work, we investigate the latent feature space of a pre-trained StyleGAN and discover some excellent spatial transformation properties.
We propose a novel unified framework based on a pre-trained StyleGAN that enables a set of powerful functionalities.
arXiv Detail & Related papers (2022-03-08T12:06:12Z) - Pose-Controllable Talking Face Generation by Implicitly Modularized
Audio-Visual Representation [96.66010515343106]
We propose a clean yet effective framework to generate pose-controllable talking faces.
We operate on raw face images, using only a single photo as an identity reference.
Our model has multiple advanced capabilities including extreme view robustness and talking face frontalization.
arXiv Detail & Related papers (2021-04-22T15:10:26Z) - PVA: Pixel-aligned Volumetric Avatars [34.929560973779466]
We devise a novel approach for predicting volumetric avatars of the human head given just a small number of inputs.
Our approach is trained in an end-to-end manner solely based on a photometric re-rendering loss without requiring explicit 3D supervision.
arXiv Detail & Related papers (2021-01-07T18:58:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.