READ Avatars: Realistic Emotion-controllable Audio Driven Avatars
        - URL: http://arxiv.org/abs/2303.00744v1
- Date: Wed, 1 Mar 2023 18:56:43 GMT
- Title: READ Avatars: Realistic Emotion-controllable Audio Driven Avatars
- Authors: Jack Saunders, Vinay Namboodiri
- Abstract summary: We present READ Avatars, a 3D-based approach for generating 2D avatars driven by audio input with direct and granular control over the emotion.
Previous methods are unable to achieve realistic animation due to the many-to-many nature of audio to expression mappings.
This removes the smoothing effect of regression-based models and helps to improve the realism and expressiveness of the generated avatars.
- Score: 11.98034899127065
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract:   We present READ Avatars, a 3D-based approach for generating 2D avatars that
are driven by audio input with direct and granular control over the emotion.
Previous methods are unable to achieve realistic animation due to the
many-to-many nature of audio to expression mappings. We alleviate this issue by
introducing an adversarial loss in the audio-to-expression generation process.
This removes the smoothing effect of regression-based models and helps to
improve the realism and expressiveness of the generated avatars. We note
furthermore, that audio should be directly utilized when generating mouth
interiors and that other 3D-based methods do not attempt this. We address this
with audio-conditioned neural textures, which are resolution-independent. To
evaluate the performance of our method, we perform quantitative and qualitative
experiments, including a user study. We also propose a new metric for comparing
how well an actor's emotion is reconstructed in the generated avatar. Our
results show that our approach outperforms state of the art audio-driven avatar
generation methods across several metrics. A demo video can be found at
\url{https://youtu.be/QSyMl3vV0pA}
 
      
        Related papers
        - VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis [70.76837748695841]
 We propose VisualSpeaker, a novel method that bridges the gap using photorealistic differentiable rendering, supervised by visual speech recognition, for improved 3D facial animation.<n>Our contribution is a perceptual lip-reading loss, derived by passing 3D Gaussian Splatting avatar renders through a pre-trained Visual Automatic Speech Recognition model during training.<n> Evaluation on the MEAD dataset demonstrates that VisualSpeaker improves both the standard Lip Vertex Error metric by 56.1% and the perceptual quality of the generated animations, while retaining the controllability of mesh-driven animation.
 arXiv  Detail & Related papers  (2025-07-08T15:04:17Z)
- Supervising 3D Talking Head Avatars with Analysis-by-Audio-Synthesis [44.503709089687014]
 Speech-driven 3D head avatars must articulate their lips in accordance with speech.<n>The key problem is that deterministic models produce high-quality lip-sync but without rich expressions.<n>We propose THUNDER, a 3D talking head avatar framework that introduces a novel supervision mechanism via differentiable sound production.
 arXiv  Detail & Related papers  (2025-04-18T00:24:52Z)
- AV-Flow: Transforming Text to Audio-Visual Human-like Interactions [101.31009576033776]
 AV-Flow is an audio-visual generative model that animates photo-realistic 4D talking avatars given only text input.
We demonstrate human-like speech synthesis, synchronized lip motion, lively facial expressions and head pose.
 arXiv  Detail & Related papers  (2025-02-18T18:56:18Z)
- Generalizable and Animatable Gaussian Head Avatar [50.34788590904843]
 We propose Generalizable and Animatable Gaussian head Avatar (GAGAvatar) for one-shot animatable head avatar reconstruction.
We generate the parameters of 3D Gaussians from a single image in a single forward pass.
Our method exhibits superior performance compared to previous methods in terms of reconstruction quality and expression accuracy.
 arXiv  Detail & Related papers  (2024-10-10T14:29:00Z)
- DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D
  Diffusion [69.67970568012599]
 We present DreamWaltz-G, a novel learning framework for animatable 3D avatar generation from text.
The core of this framework lies in Score Distillation and Hybrid 3D Gaussian Avatar representation.
Our framework further supports diverse applications, including human video reenactment and multi-subject scene composition.
 arXiv  Detail & Related papers  (2024-09-25T17:59:45Z)
- EmoFace: Audio-driven Emotional 3D Face Animation [3.573880705052592]
 EmoFace is a novel audio-driven methodology for creating facial animations with vivid emotional dynamics.
Our approach can generate facial expressions with multiple emotions, and has the ability to generate random yet natural blinks and eye movements.
Our proposed methodology can be applied in producing dialogues animations of non-playable characters in video games, and driving avatars in virtual reality environments.
 arXiv  Detail & Related papers  (2024-07-17T11:32:16Z)
- InstructAvatar: Text-Guided Emotion and Motion Control for Avatar   Generation [39.235962838952624]
 In this paper, we propose a novel text-guided approach for generating emotionally expressive 2D avatars.
Our framework, named InstructAvatar, leverages a natural language interface to control the emotion as well as the facial motion of avatars.
 Experimental results demonstrate that InstructAvatar produces results that align well with both conditions.
 arXiv  Detail & Related papers  (2024-05-24T17:53:54Z)
- OPHAvatars: One-shot Photo-realistic Head Avatars [0.0]
 Given a portrait, our method synthesizes a coarse talking head video using driving keypoints features.
With rendered images of the coarse avatar, our method updates the low-quality images with a blind face restoration model.
After several iterations, our method can synthesize a photo-realistic animatable 3D neural head avatar.
 arXiv  Detail & Related papers  (2023-07-18T11:24:42Z)
- AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars [84.85009267371218]
 We propose AvatarStudio, a text-based method for editing the appearance of a dynamic full head avatar.
Our approach builds on existing work to capture dynamic performances of human heads using neural field (NeRF) and edits this representation with a text-to-image diffusion model.
Our method edits the full head in a canonical space, and then propagates these edits to remaining time steps via a pretrained deformation network.
 arXiv  Detail & Related papers  (2023-06-01T11:06:01Z)
- AvatarMAV: Fast 3D Head Avatar Reconstruction Using Motion-Aware Neural
  Voxels [33.085274792188756]
 We propose AvatarMAV, a fast 3D head avatar reconstruction method using Motion-Aware Neural Voxels.
 AvatarMAV is the first to model both the canonical appearance and the decoupled expression motion by neural voxels for head avatar.
The proposed AvatarMAV can recover photo-realistic head avatars in just 5 minutes, which is significantly faster than the state-of-the-art facial reenactment methods.
 arXiv  Detail & Related papers  (2022-11-23T18:49:31Z)
- EMOCA: Emotion Driven Monocular Face Capture and Animation [59.15004328155593]
 We introduce a novel deep perceptual emotion consistency loss during training, which helps ensure that the reconstructed 3D expression matches the expression depicted in the input image.
On the task of in-the-wild emotion recognition, our purely geometric approach is on par with the best image-based methods, highlighting the value of 3D geometry in analyzing human behavior.
 arXiv  Detail & Related papers  (2022-04-24T15:58:35Z)
- PIRenderer: Controllable Portrait Image Generation via Semantic Neural
  Rendering [56.762094966235566]
 A Portrait Image Neural Renderer is proposed to control the face motions with the parameters of three-dimensional morphable face models.
The proposed model can generate photo-realistic portrait images with accurate movements according to intuitive modifications.
Our model can generate coherent videos with convincing movements from only a single reference image and a driving audio stream.
 arXiv  Detail & Related papers  (2021-09-17T07:24:16Z)
- Audio- and Gaze-driven Facial Animation of Codec Avatars [149.0094713268313]
 We describe the first approach to animate Codec Avatars in real-time using audio and/or eye tracking.
Our goal is to display expressive conversations between individuals that exhibit important social signals.
 arXiv  Detail & Related papers  (2020-08-11T22:28:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.