CapHuman: Capture Your Moments in Parallel Universes
- URL: http://arxiv.org/abs/2402.00627v3
- Date: Fri, 17 May 2024 14:40:55 GMT
- Title: CapHuman: Capture Your Moments in Parallel Universes
- Authors: Chao Liang, Fan Ma, Linchao Zhu, Yingying Deng, Yi Yang,
- Abstract summary: We present a new framework named CapHuman.
CapHuman encodes identity features and then learns to align them into the latent space.
We introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner.
- Score: 60.06408546134581
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We concentrate on a novel human-centric image synthesis task, that is, given only one reference facial photograph, it is expected to generate specific individual images with diverse head positions, poses, facial expressions, and illuminations in different contexts. To accomplish this goal, we argue that our generative model should be capable of the following favorable characteristics: (1) a strong visual and semantic understanding of our world and human society for basic object and human image generation. (2) generalizable identity preservation ability. (3) flexible and fine-grained head control. Recently, large pre-trained text-to-image diffusion models have shown remarkable results, serving as a powerful generative foundation. As a basis, we aim to unleash the above two capabilities of the pre-trained model. In this work, we present a new framework named CapHuman. We embrace the "encode then learn to align" paradigm, which enables generalizable identity preservation for new individuals without cumbersome tuning at inference. CapHuman encodes identity features and then learns to align them into the latent space. Moreover, we introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner. Extensive qualitative and quantitative analyses demonstrate our CapHuman can produce well-identity-preserved, photo-realistic, and high-fidelity portraits with content-rich representations and various head renditions, superior to established baselines. Code and checkpoint will be released at https://github.com/VamosC/CapHuman.
Related papers
- HINT: Learning Complete Human Neural Representations from Limited Viewpoints [69.76947323932107]
We propose a NeRF-based algorithm able to learn a detailed and complete human model from limited viewing angles.
As a result, our method can reconstruct complete humans even from a few viewing angles, increasing performance by more than 15% PSNR.
arXiv Detail & Related papers (2024-05-30T05:43:09Z) - VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis [40.869862603815875]
VLOGGER is a method for audio-driven human video generation from a single input image.
We use a novel diffusion-based architecture that augments text-to-image models with both spatial and temporal controls.
We show applications in video editing and personalization.
arXiv Detail & Related papers (2024-03-13T17:59:02Z) - StableIdentity: Inserting Anybody into Anywhere at First Sight [57.99693188913382]
We propose StableIdentity, which allows identity-consistent recontextualization with just one face image.
We are the first to directly inject the identity learned from a single image into video/3D generation without finetuning.
arXiv Detail & Related papers (2024-01-29T09:06:15Z) - XAGen: 3D Expressive Human Avatars Generation [76.69560679209171]
XAGen is the first 3D generative model for human avatars capable of expressive control over body, face, and hands.
We propose a multi-part rendering technique that disentangles the synthesis of body, face, and hands.
Experiments show that XAGen surpasses state-of-the-art methods in terms of realism, diversity, and expressive control abilities.
arXiv Detail & Related papers (2023-11-22T18:30:42Z) - HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion [114.15397904945185]
We propose a unified framework, HyperHuman, that generates in-the-wild human images of high realism and diverse layouts.
Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network.
Our framework yields the state-of-the-art performance, generating hyper-realistic human images under diverse scenarios.
arXiv Detail & Related papers (2023-10-12T17:59:34Z) - Neural Novel Actor: Learning a Generalized Animatable Neural
Representation for Human Actors [98.24047528960406]
We propose a new method for learning a generalized animatable neural representation from a sparse set of multi-view imagery of multiple persons.
The learned representation can be used to synthesize novel view images of an arbitrary person from a sparse set of cameras, and further animate them with the user's pose control.
arXiv Detail & Related papers (2022-08-25T07:36:46Z) - VariTex: Variational Neural Face Textures [0.0]
VariTex is a method that learns a variational latent feature space of neural face textures.
To generate images of complete human heads, we propose an additive decoder that generates plausible additional details such as hair.
The resulting method can generate geometrically consistent images of novel identities allowing fine-grained control over head pose, face shape, and facial expressions.
arXiv Detail & Related papers (2021-04-13T07:47:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.