Related papers: MyPortrait: Morphable Prior-Guided Personalized Portrait Generation

MyPortrait: Morphable Prior-Guided Personalized Portrait Generation

URL: http://arxiv.org/abs/2312.02703v1
Date: Tue, 5 Dec 2023 12:05:01 GMT
Title: MyPortrait: Morphable Prior-Guided Personalized Portrait Generation
Authors: Bo Ding, Zhenfeng Fan, Shuang Yang, Shihong Xia
Abstract summary: Myportrait is a simple, general, and flexible framework for neural portrait generation. Our proposed framework supports both video-driven and audio-driven face animation. Our method provides a real-time online version and a high-quality offline version.
Score: 19.911068375240905
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating realistic talking faces is an interesting and long-standing topic in the field of computer vision. Although significant progress has been made, it is still challenging to generate high-quality dynamic faces with personalized details. This is mainly due to the inability of the general model to represent personalized details and the generalization problem to unseen controllable parameters. In this work, we propose Myportrait, a simple, general, and flexible framework for neural portrait generation. We incorporate personalized prior in a monocular video and morphable prior in 3D face morphable space for generating personalized details under novel controllable parameters. Our proposed framework supports both video-driven and audio-driven face animation given a monocular video of a single person. Distinguished by whether the test data is sent to training or not, our method provides a real-time online version and a high-quality offline version. Comprehensive experiments in various metrics demonstrate the superior performance of our method over the state-of-the-art methods. The code will be publicly available.

Related papers

Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer [25.39030226963548]
We introduce the first application of a pretrained transformer-based video generative model for portrait animation. Our method is validated through experiments on benchmark and newly proposed wild datasets.
arXiv Detail & Related papers (2024-12-01T08:54:30Z)
MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes [74.82911268630463]
Talking face generation (TFG) aims to animate a target identity's face to create realistic talking videos. MimicTalk exploits the rich knowledge from a NeRF-based person-agnostic generic model for improving the efficiency and robustness of personalized TFG. Experiments show that our MimicTalk surpasses previous baselines regarding video quality, efficiency, and expressiveness.
arXiv Detail & Related papers (2024-10-09T10:12:37Z)
Single Image, Any Face: Generalisable 3D Face Generation [59.9369171926757]
We propose a novel model, Gen3D-Face, which generates 3D human faces with unconstrained single image input. To the best of our knowledge, this is the first attempt and benchmark for creating photorealistic 3D human face avatars from single images.
arXiv Detail & Related papers (2024-09-25T14:56:37Z)
GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations [54.94362657501809]
We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time. At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements. We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework.
arXiv Detail & Related papers (2024-09-18T13:05:43Z)
SPARK: Self-supervised Personalized Real-time Monocular Face Capture [6.093606972415841]
Current state of the art approaches have the ability to regress parametric 3D face models in real-time across a wide range of identities. We propose a method for high-precision 3D face capture taking advantage of a collection of unconstrained videos of a subject as prior information.
arXiv Detail & Related papers (2024-09-12T12:30:04Z)
VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis [40.869862603815875]
VLOGGER is a method for audio-driven human video generation from a single input image. We use a novel diffusion-based architecture that augments text-to-image models with both spatial and temporal controls. We show applications in video editing and personalization.
arXiv Detail & Related papers (2024-03-13T17:59:02Z)
GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance [83.43852715997596]
GSmoothFace is a novel two-stage generalized talking face generation model guided by a fine-grained 3d face model. It can synthesize smooth lip dynamics while preserving the speaker's identity. Both quantitative and qualitative experiments confirm the superiority of our method in terms of realism, lip synchronization, and visual quality.
arXiv Detail & Related papers (2023-12-12T16:00:55Z)
PVP: Personalized Video Prior for Editable Dynamic Portraits using StyleGAN [33.49053731211931]
StyleGAN has shown promising results in photorealistic and accurate reconstruction of human faces. In this work, our goal is to take as input a monocular video of a face, and create an editable dynamic portrait. The user can create novel viewpoints, edit the appearance, and animate the face.
arXiv Detail & Related papers (2023-06-29T17:26:51Z)
HQ3DAvatar: High Quality Controllable 3D Head Avatar [65.70885416855782]
This paper presents a novel approach to building highly photorealistic digital head avatars. Our method learns a canonical space via an implicit function parameterized by a neural network. At test time, our method is driven by a monocular RGB video.
arXiv Detail & Related papers (2023-03-25T13:56:33Z)
Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation [96.66010515343106]
We propose a clean yet effective framework to generate pose-controllable talking faces. We operate on raw face images, using only a single photo as an identity reference. Our model has multiple advanced capabilities including extreme view robustness and talking face frontalization.
arXiv Detail & Related papers (2021-04-22T15:10:26Z)
PVA: Pixel-aligned Volumetric Avatars [34.929560973779466]
We devise a novel approach for predicting volumetric avatars of the human head given just a small number of inputs. Our approach is trained in an end-to-end manner solely based on a photometric re-rendering loss without requiring explicit 3D supervision.
arXiv Detail & Related papers (2021-01-07T18:58:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.