HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation
- URL: http://arxiv.org/abs/2403.09326v4
- Date: Wed, 30 Apr 2025 03:06:06 GMT
- Title: HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation
- Authors: Duotun Wang, Hengyu Meng, Zeyu Cai, Zhijing Shao, Qianxi Liu, Lin Wang, Mingming Fan, Xiaohang Zhan, Zeyu Wang,
- Abstract summary: This paper introduces a novel framework for generating stylized head avatars from text guidance.<n>Our method represents mesh deformation with per-face Jacobians and adaptively modulates local deformation using a learnable vector field.<n>Our framework can generate realistic shapes and textures that can be further edited via text, while supporting seamless editing using the preserved attributes from the template mesh.
- Score: 17.590555698266346
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Current text-to-avatar methods often rely on implicit representations (e.g., NeRF, SDF, and DMTet), leading to 3D content that artists cannot easily edit and animate in graphics software. This paper introduces a novel framework for generating stylized head avatars from text guidance, which leverages locally learnable mesh deformation and 2D diffusion priors to achieve high-quality digital assets for attribute-preserving manipulation. Given a template mesh, our method represents mesh deformation with per-face Jacobians and adaptively modulates local deformation using a learnable vector field. This vector field enables anisotropic scaling while preserving the rotation of vertices, which can better express identity and geometric details. We employ landmark- and contour-based regularization terms to balance the expressiveness and plausibility of generated avatars from multiple views without relying on any specific shape prior. Our framework can generate realistic shapes and textures that can be further edited via text, while supporting seamless editing using the preserved attributes from the template mesh, such as 3DMM parameters, blendshapes, and UV coordinates. Extensive experiments demonstrate that our framework can generate diverse and expressive head avatars with high-quality meshes that artists can easily manipulate in graphics software, facilitating downstream applications such as efficient asset creation and animation with preserved attributes.
Related papers
- Generating Editable Head Avatars with 3D Gaussian GANs [57.51487984425395]
Traditional 3D-aware generative adversarial networks (GANs) achieve photorealistic and view-consistent 3D head synthesis.
We propose a novel approach that enhances the editability and animation control of 3D head avatars by incorporating 3D Gaussian Splatting (3DGS) as an explicit 3D representation.
Our approach delivers high-quality 3D-aware synthesis with state-of-the-art controllability.
arXiv Detail & Related papers (2024-12-26T10:10:03Z) - GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations [54.94362657501809]
We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time.
At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements.
We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework.
arXiv Detail & Related papers (2024-09-18T13:05:43Z) - GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image [89.70322127648349]
We propose a generic avatar editing approach that can be universally applied to various 3DMM driving volumetric head avatars.
To achieve this goal, we design a novel expression-aware modification generative model, which enables lift 2D editing from a single image to a consistent 3D modification field.
arXiv Detail & Related papers (2024-04-02T17:58:35Z) - GPAvatar: Generalizable and Precise Head Avatar from Image(s) [71.555405205039]
GPAvatar is a framework that reconstructs 3D head avatars from one or several images in a single forward pass.
The proposed method achieves faithful identity reconstruction, precise expression control, and multi-view consistency.
arXiv Detail & Related papers (2024-01-18T18:56:34Z) - FLARE: Fast Learning of Animatable and Relightable Mesh Avatars [64.48254296523977]
Our goal is to efficiently learn personalized animatable 3D head avatars from videos that are geometrically accurate, realistic, relightable, and compatible with current rendering systems.
We introduce FLARE, a technique that enables the creation of animatable and relightable avatars from a single monocular video.
arXiv Detail & Related papers (2023-10-26T16:13:00Z) - TADA! Text to Animatable Digital Avatars [57.52707683788961]
TADA takes textual descriptions and produces expressive 3D avatars with high-quality geometry and lifelike textures.
We derive an optimizable high-resolution body model from SMPL-X with 3D displacements and a texture map.
We render normals and RGB images of the generated character and exploit their latent embeddings in the SDS training process.
arXiv Detail & Related papers (2023-08-21T17:59:10Z) - 3Deformer: A Common Framework for Image-Guided Mesh Deformation [27.732389685912214]
Given a source 3D mesh with semantic materials, and a user-specified semantic image, 3Deformer can accurately edit the source mesh.
Our 3Deformer is able to produce impressive results and reaches the state-of-the-art level.
arXiv Detail & Related papers (2023-07-19T10:44:44Z) - Generalizable One-shot Neural Head Avatar [90.50492165284724]
We present a method that reconstructs and animates a 3D head avatar from a single-view portrait image.
We propose a framework that not only generalizes to unseen identities based on a single-view image, but also captures characteristic details within and beyond the face area.
arXiv Detail & Related papers (2023-06-14T22:33:09Z) - Single-Shot Implicit Morphable Faces with Consistent Texture
Parameterization [91.52882218901627]
We propose a novel method for constructing implicit 3D morphable face models that are both generalizable and intuitive for editing.
Our method improves upon photo-realism, geometry, and expression accuracy compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-04T17:58:40Z) - TextDeformer: Geometry Manipulation using Text Guidance [37.02412892926677]
We present a technique for producing a deformation of an input triangle mesh guided solely by a text prompt.
Our framework relies on differentiable rendering to connect geometry to powerful pre-trained image encoders, such as CLIP and DINO.
To overcome this limitation, we opt to represent our mesh deformation through Jacobians, which updates deformations in a global, smooth manner.
arXiv Detail & Related papers (2023-04-26T07:38:41Z) - TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision [114.56048848216254]
We present a novel framework, TAPS3D, to train a text-guided 3D shape generator with pseudo captions.
Based on rendered 2D images, we retrieve relevant words from the CLIP vocabulary and construct pseudo captions using templates.
Our constructed captions provide high-level semantic supervision for generated 3D shapes.
arXiv Detail & Related papers (2023-03-23T13:53:16Z) - Next3D: Generative Neural Texture Rasterization for 3D-Aware Head
Avatars [36.4402388864691]
3D-aware generative adversarial networks (GANs) synthesize high-fidelity and multi-view-consistent facial images using only collections of single-view 2D imagery.
Recent efforts incorporate 3D Morphable Face Model (3DMM) to describe deformation in generative radiance fields either explicitly or implicitly.
We propose a novel 3D GAN framework for unsupervised learning of generative, high-quality and 3D-consistent facial avatars from unstructured 2D images.
arXiv Detail & Related papers (2022-11-21T06:40:46Z) - Pixel2Mesh++: 3D Mesh Generation and Refinement from Multi-View Images [82.32776379815712]
We study the problem of shape generation in 3D mesh representation from a small number of color images with or without camera poses.
We adopt to further improve the shape quality by leveraging cross-view information with a graph convolution network.
Our model is robust to the quality of the initial mesh and the error of camera pose, and can be combined with a differentiable function for test-time optimization.
arXiv Detail & Related papers (2022-04-21T03:42:31Z) - I M Avatar: Implicit Morphable Head Avatars from Videos [68.13409777995392]
We propose IMavatar, a novel method for learning implicit head avatars from monocular videos.
Inspired by the fine-grained control mechanisms afforded by conventional 3DMMs, we represent the expression- and pose-related deformations via learned blendshapes and skinning fields.
We show quantitatively and qualitatively that our method improves geometry and covers a more complete expression space compared to state-of-the-art methods.
arXiv Detail & Related papers (2021-12-14T15:30:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.