Towards Rich Emotions in 3D Avatars: A Text-to-3D Avatar Generation Benchmark
- URL: http://arxiv.org/abs/2412.02508v1
- Date: Tue, 03 Dec 2024 15:39:05 GMT
- Title: Towards Rich Emotions in 3D Avatars: A Text-to-3D Avatar Generation Benchmark
- Authors: Haidong Xu, Meishan Zhang, Hao Ju, Zhedong Zheng, Hongyuan Zhu, Erik Cambria, Min Zhang, Hao Fei,
- Abstract summary: producing emotionally dynamic 3D facial avatars with text derived from spoken words (Emo3D) has been a pivotal research topic in 3D avatar generation.
This paper reexamines Emo3D generation and draws inspiration from human processes, breaking down Emo3D into two cascading steps: Text-to-3D Expression Mapping (T3DEM) and 3D Avatar Rendering (3DAR)
To address these challenges, we introduce a novel benchmark to advance research in Emo3D generation.
- Score: 78.72350264142987
- License:
- Abstract: Producing emotionally dynamic 3D facial avatars with text derived from spoken words (Emo3D) has been a pivotal research topic in 3D avatar generation. While progress has been made in general-purpose 3D avatar generation, the exploration of generating emotional 3D avatars remains scarce, primarily due to the complexities of identifying and rendering rich emotions from spoken words. This paper reexamines Emo3D generation and draws inspiration from human processes, breaking down Emo3D into two cascading steps: Text-to-3D Expression Mapping (T3DEM) and 3D Avatar Rendering (3DAR). T3DEM is the most crucial step in determining the quality of Emo3D generation and encompasses three key challenges: Expression Diversity, Emotion-Content Consistency, and Expression Fluidity. To address these challenges, we introduce a novel benchmark to advance research in Emo3D generation. First, we present EmoAva, a large-scale, high-quality dataset for T3DEM, comprising 15,000 text-to-3D expression mappings that characterize the aforementioned three challenges in Emo3D generation. Furthermore, we develop various metrics to effectively evaluate models against these identified challenges. Next, to effectively model the consistency, diversity, and fluidity of human expressions in the T3DEM step, we propose the Continuous Text-to-Expression Generator, which employs an autoregressive Conditional Variational Autoencoder for expression code generation, enhanced with Latent Temporal Attention and Expression-wise Attention mechanisms. Finally, to further enhance the 3DAR step on rendering higher-quality subtle expressions, we present the Globally-informed Gaussian Avatar (GiGA) model. GiGA incorporates a global information mechanism into 3D Gaussian representations, enabling the capture of subtle micro-expressions and seamless transitions between emotional states.
Related papers
- Emo3D: Metric and Benchmarking Dataset for 3D Facial Expression Generation from Emotion Description [3.52270271101496]
"Emo3D" is an extensive "Text-Image-Expression dataset" spanning a wide spectrum of human emotions.
We generate a diverse array of textual descriptions, facilitating the capture of a broad spectrum of emotional expressions.
"Emo3D" has great applications in animation design, virtual reality, and emotional human-computer interaction.
arXiv Detail & Related papers (2024-10-02T21:31:24Z) - DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D
Diffusion [69.67970568012599]
We present DreamWaltz-G, a novel learning framework for animatable 3D avatar generation from text.
The core of this framework lies in Score Distillation and Hybrid 3D Gaussian Avatar representation.
Our framework further supports diverse applications, including human video reenactment and multi-subject scene composition.
arXiv Detail & Related papers (2024-09-25T17:59:45Z) - EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head [30.138347111341748]
We present a novel approach for synthesizing 3D talking heads with controllable emotion.
Our model enables controllable emotion in the generated talking heads and can be rendered in wide-range views.
Experiments demonstrate the effectiveness of our approach in generating high-fidelity and emotion-controllable 3D talking heads.
arXiv Detail & Related papers (2024-08-01T05:46:57Z) - EmoVOCA: Speech-Driven Emotional 3D Talking Heads [12.161006152509653]
We propose an innovative data-driven technique for creating a synthetic dataset, called EmoVOCA.
We then designed and trained an emotional 3D talking head generator that accepts a 3D face, an audio file, an emotion label, and an intensity value as inputs, and learns to animate the audio-synchronized lip movements with expressive traits of the face.
arXiv Detail & Related papers (2024-03-19T16:33:26Z) - 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene.
Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z) - Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation
Using only Images [105.92311979305065]
TG-3DFace creates more realistic and aesthetically pleasing 3D faces, boosting 9% multi-view consistency (MVIC) over Latent3D.
The rendered face images generated by TG-3DFace achieve higher FID and CLIP score than text-to-2D face/image generation models.
arXiv Detail & Related papers (2023-08-31T14:26:33Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - Next3D: Generative Neural Texture Rasterization for 3D-Aware Head
Avatars [36.4402388864691]
3D-aware generative adversarial networks (GANs) synthesize high-fidelity and multi-view-consistent facial images using only collections of single-view 2D imagery.
Recent efforts incorporate 3D Morphable Face Model (3DMM) to describe deformation in generative radiance fields either explicitly or implicitly.
We propose a novel 3D GAN framework for unsupervised learning of generative, high-quality and 3D-consistent facial avatars from unstructured 2D images.
arXiv Detail & Related papers (2022-11-21T06:40:46Z) - Controllable 3D Generative Adversarial Face Model via Disentangling
Shape and Appearance [63.13801759915835]
3D face modeling has been an active area of research in computer vision and computer graphics.
This paper proposes a new 3D face generative model that can decouple identity and expression.
arXiv Detail & Related papers (2022-08-30T13:40:48Z) - 3D to 4D Facial Expressions Generation Guided by Landmarks [35.61963927340274]
Given one input 3D neutral face, can we generate dynamic 3D (4D) facial expressions from it?
We first propose a mesh encoder-decoder architecture (Expr-ED) that exploits a set of 3D landmarks to generate an expressive 3D face from its neutral counterpart.
We extend it to 4D by modeling the temporal dynamics of facial expressions using a manifold-valued GAN.
arXiv Detail & Related papers (2021-05-16T15:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.