InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
- URL: http://arxiv.org/abs/2405.15758v1
- Date: Fri, 24 May 2024 17:53:54 GMT
- Title: InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
- Authors: Yuchi Wang, Junliang Guo, Jianhong Bai, Runyi Yu, Tianyu He, Xu Tan, Xu Sun, Jiang Bian,
- Abstract summary: In this paper, we propose a novel text-guided approach for generating emotionally expressive 2D avatars.
Our framework, named InstructAvatar, leverages a natural language interface to control the emotion as well as the facial motion of avatars.
Experimental results demonstrate that InstructAvatar produces results that align well with both conditions.
- Score: 39.235962838952624
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent talking avatar generation models have made strides in achieving realistic and accurate lip synchronization with the audio, but often fall short in controlling and conveying detailed expressions and emotions of the avatar, making the generated video less vivid and controllable. In this paper, we propose a novel text-guided approach for generating emotionally expressive 2D avatars, offering fine-grained control, improved interactivity, and generalizability to the resulting video. Our framework, named InstructAvatar, leverages a natural language interface to control the emotion as well as the facial motion of avatars. Technically, we design an automatic annotation pipeline to construct an instruction-video paired training dataset, equipped with a novel two-branch diffusion-based generator to predict avatars with audio and text instructions at the same time. Experimental results demonstrate that InstructAvatar produces results that align well with both conditions, and outperforms existing methods in fine-grained emotion control, lip-sync quality, and naturalness. Our project page is https://wangyuchi369.github.io/InstructAvatar/.
Related papers
- EmoFace: Audio-driven Emotional 3D Face Animation [3.573880705052592]
EmoFace is a novel audio-driven methodology for creating facial animations with vivid emotional dynamics.
Our approach can generate facial expressions with multiple emotions, and has the ability to generate random yet natural blinks and eye movements.
Our proposed methodology can be applied in producing dialogues animations of non-playable characters in video games, and driving avatars in virtual reality environments.
arXiv Detail & Related papers (2024-07-17T11:32:16Z) - Disentangled Clothed Avatar Generation from Text Descriptions [39.5476255730693]
We introduce a novel text-to-avatar generation method that separately generates the human body and the clothes.
Our approach achieves higher exture and geometry quality and better semantic alignment with text prompts.
arXiv Detail & Related papers (2023-12-08T18:43:12Z) - AvatarStudio: High-fidelity and Animatable 3D Avatar Creation from Text [71.09533176800707]
AvatarStudio is a coarse-to-fine generative model that generates explicit textured 3D meshes for animatable human avatars.
By effectively leveraging the synergy between the articulated mesh representation and the DensePose-conditional diffusion model, AvatarStudio can create high-quality avatars.
arXiv Detail & Related papers (2023-11-29T18:59:32Z) - GAIA: Zero-shot Talking Avatar Generation [64.78978434650416]
We introduce GAIA (Generative AI for Avatar), which eliminates the domain priors in talking avatar generation.
GAIA beats previous baseline models in terms of naturalness, diversity, lip-sync quality, and visual quality.
It is general and enables different applications like controllable talking avatar generation and text-instructed avatar generation.
arXiv Detail & Related papers (2023-11-26T08:04:43Z) - MagicAvatar: Multimodal Avatar Generation and Animation [70.55750617502696]
MagicAvatar is a framework for multimodal video generation and animation of human avatars.
It disentangles avatar video generation into two stages: multimodal-to-motion and motion-to-video generation.
We demonstrate the flexibility of MagicAvatar through various applications, including text-guided and video-guided avatar generation.
arXiv Detail & Related papers (2023-08-28T17:56:18Z) - AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation [14.062402203105712]
AvatarBooth is a novel method for generating high-quality 3D avatars using text prompts or specific images.
Our key contribution is the precise avatar generation control by using dual fine-tuned diffusion models.
We present a multi-resolution rendering strategy that facilitates coarse-to-fine supervision of 3D avatar generation.
arXiv Detail & Related papers (2023-06-16T14:18:51Z) - Emotional Speech-Driven Animation with Content-Emotion Disentanglement [51.34635009347183]
We propose EMOTE, which generates 3D talking-head avatars that maintain lip-sync from speech while enabling explicit control over the expression of emotion.
EmOTE produces speech-driven facial animations with better lip-sync than state-of-the-art methods trained on the same data.
arXiv Detail & Related papers (2023-06-15T09:31:31Z) - Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis [66.43223397997559]
We aim to synthesize high-quality talking portrait videos corresponding to the input text.
This task has broad application prospects in the digital human industry but has not been technically achieved yet.
We introduce Adaptive Text-to-Talking Avatar (Ada-TTA), which designs a generic zero-shot multi-speaker Text-to-Speech model.
arXiv Detail & Related papers (2023-06-06T08:50:13Z) - READ Avatars: Realistic Emotion-controllable Audio Driven Avatars [11.98034899127065]
We present READ Avatars, a 3D-based approach for generating 2D avatars driven by audio input with direct and granular control over the emotion.
Previous methods are unable to achieve realistic animation due to the many-to-many nature of audio to expression mappings.
This removes the smoothing effect of regression-based models and helps to improve the realism and expressiveness of the generated avatars.
arXiv Detail & Related papers (2023-03-01T18:56:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.