Related papers: ESARM: 3D Emotional Speech-to-Animation via Reward Model from Automatically-Ranked Demonstrations

ESARM: 3D Emotional Speech-to-Animation via Reward Model from Automatically-Ranked Demonstrations

URL: http://arxiv.org/abs/2411.13089v2
Date: Mon, 25 Nov 2024 21:12:25 GMT
Title: ESARM: 3D Emotional Speech-to-Animation via Reward Model from Automatically-Ranked Demonstrations
Authors: Xulong Zhang, Xiaoyang Qu, Haoxiang Shi, Chunguang Xiao, Jianzong Wang,
Abstract summary: This paper proposes a novel 3D speech-to-animation (STA) generation framework to address the shortcomings of existing models. We introduce a novel STA model coupled with a reward model. This combination enables the decoupling of emotion and content under audio conditions. We conduct extensive empirical experiments on a benchmark dataset, and the results validate the effectiveness of our proposed framework.
Score: 22.85503397110192
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper proposes a novel 3D speech-to-animation (STA) generation framework designed to address the shortcomings of existing models in producing diverse and emotionally resonant animations. Current STA models often generate animations that lack emotional depth and variety, failing to align with human expectations. To overcome these limitations, we introduce a novel STA model coupled with a reward model. This combination enables the decoupling of emotion and content under audio conditions through a cross-coupling training approach. Additionally, we develop a training methodology that leverages automatic quality evaluation of generated facial animations to guide the reinforcement learning process. This methodology encourages the STA model to explore a broader range of possibilities, resulting in the generation of diverse and emotionally expressive facial animations of superior quality. We conduct extensive empirical experiments on a benchmark dataset, and the results validate the effectiveness of our proposed framework in generating high-quality, emotionally rich 3D animations that are better aligned with human preferences.

Related papers

ESGaussianFace: Emotional and Stylized Audio-Driven Facial Animation via 3D Gaussian Splatting [34.65130896150361]
ESGaussianFace is an innovative framework for emotional and stylized audio-driven facial animation.<n>We propose an emotion-audio-guided spatial attention method that effectively integrates emotion features with audio content features.<n>Our generated results exhibit high efficiency, high quality, and 3D consistency.
arXiv Detail & Related papers (2026-01-05T07:19:38Z)
Evaluation of Generative Models for Emotional 3D Animation Generation in VR [14.647008099740356]
We evaluate emotional 3D animation generative models within a Virtual Reality (VR) environment.<n>We examine perceived emotional quality for three state of the art speech-driven 3D animation methods across two emotions happiness (high arousal) and neutral (mid arousal)<n>Our results demonstrate that methods explicitly modeling emotions lead to higher recognition accuracy compared to those focusing solely on speech-driven synchrony.
arXiv Detail & Related papers (2025-12-18T01:56:22Z)
PersonaLive! Expressive Portrait Image Animation for Live Streaming [53.63615310186964]
PersonaLive is a novel diffusion-based framework towards streaming real-time portrait animation.<n>We first adopt hybrid implicit signals, namely implicit facial representations and 3D implicit keypoints, to achieve expressive image-level motion control.<n>Experiments demonstrate that PersonaLive achieves state-of-the-art performance with up to 7-22x speedup over prior diffusion-based portrait animation models.
arXiv Detail & Related papers (2025-12-12T03:24:40Z)
Human Feedback Driven Dynamic Speech Emotion Recognition [0.0]
The study particularly focuses on the animation of emotional 3D avatars.<n>We propose a multi-stage method that includes the training of a classical speech emotion recognition model.<n>We introduce a novel approach to modeling emotional mixtures based on the Dirichlet distribution.
arXiv Detail & Related papers (2025-08-18T17:25:27Z)
Puppeteer: Rig and Animate Your 3D Models [105.11046762553121]
Puppeteer is a comprehensive framework that addresses both automatic rigging and animation for diverse 3D objects.<n>Our system first predicts plausible skeletal structures via an auto-regressive transformer.<n>It then infers skinning weights via an attention-based architecture.
arXiv Detail & Related papers (2025-08-14T17:59:31Z)
EmoDiffusion: Enhancing Emotional 3D Facial Animation with Latent Diffusion Models [66.67979602235015]
EmoDiffusion is a novel approach that disentangles different emotions in speech to generate rich 3D emotional facial expressions. We capture facial expressions under the guidance of animation experts using LiveLinkFace on an iPhone.
arXiv Detail & Related papers (2025-03-14T02:54:22Z)
ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model [41.35209566957009]
Speech-driven 3D facial animation aims to generate realistic lip movements and facial expressions for 3D head models from arbitrary audio clips. We introduce a novel autoregressive model that achieves real-time generation of highly synchronized lip movements and realistic head poses and eye blinks.
arXiv Detail & Related papers (2025-02-27T17:49:01Z)
MagicArticulate: Make Your 3D Models Articulation-Ready [109.35703811628045]
We present MagicArticulate, an effective framework that automatically transforms static 3D models into articulation-ready assets. Our key contributions are threefold. First, we introduce Articulation-averse benchmark containing over 33k 3D models with high-quality articulation annotations, carefully curated from XL-XL. Extensive experiments demonstrate that MagicArticulate significantly outperforms existing methods across diverse object categories.
arXiv Detail & Related papers (2025-02-17T18:53:27Z)
ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE [0.0]
We argue that emotions and non-determinism are crucial to generate diverse and emotionally-rich facial animations. We propose ProbTalk3D a non-deterministic neural network approach for emotion controllable speech-driven 3D facial animation synthesis.
arXiv Detail & Related papers (2024-09-12T11:53:05Z)
EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars [36.96390906514729]
MegaPortraits model has demonstrated state-of-the-art results in this domain. We introduce our EMOPortraits model, where we: Enhance the model's capability to faithfully support intense, asymmetric face expressions. We propose a novel multi-view video dataset featuring a wide range of intense and asymmetric facial expressions.
arXiv Detail & Related papers (2024-04-29T21:23:29Z)
CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation [13.27632316528572]
Speech-driven 3D facial animation technology has been developed for years, but its practical application still lacks expectations. Main challenges lie in data limitations, lip alignment, and the naturalness of facial expressions. This paper proposes a method called CSTalk that models the correlations among different regions of facial movements and supervises the training of the generative model to generate realistic expressions.
arXiv Detail & Related papers (2024-04-29T11:19:15Z)
Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance [48.986552871497]
We introduce a novel two-stage framework that employs scene affordance as an intermediate representation. By leveraging scene affordance maps, our method overcomes the difficulty in generating human motion under multimodal condition signals. Our approach consistently outperforms all baselines on established benchmarks, including HumanML3D and HUMANISE.
arXiv Detail & Related papers (2024-03-26T18:41:07Z)
AnimateMe: 4D Facial Expressions via Diffusion Models [72.63383191654357]
Recent advances in diffusion models have enhanced the capabilities of generative models in 2D animation. We employ Graph Neural Networks (GNNs) as denoising diffusion models in a novel approach, formulating the diffusion process directly on the mesh space. This facilitates the generation of facial deformations through a mesh-diffusion-based model.
arXiv Detail & Related papers (2024-03-25T21:40:44Z)
EmoVOCA: Speech-Driven Emotional 3D Talking Heads [12.161006152509653]
We propose an innovative data-driven technique for creating a synthetic dataset, called EmoVOCA. We then designed and trained an emotional 3D talking head generator that accepts a 3D face, an audio file, an emotion label, and an intensity value as inputs, and learns to animate the audio-synchronized lip movements with expressive traits of the face.
arXiv Detail & Related papers (2024-03-19T16:33:26Z)
Real-time Animation Generation and Control on Rigged Models via Large Language Models [50.034712575541434]
We introduce a novel method for real-time animation control and generation on rigged models using natural language input. We embed a large language model (LLM) in Unity to output structured texts that can be parsed into diverse and realistic animations.
arXiv Detail & Related papers (2023-10-27T01:36:35Z)
Pose-Controllable 3D Facial Animation Synthesis using Hierarchical Audio-Vertex Attention [52.63080543011595]
A novel pose-controllable 3D facial animation synthesis method is proposed by utilizing hierarchical audio-vertex attention. The proposed method can produce more realistic facial expressions and head posture movements.
arXiv Detail & Related papers (2023-02-24T09:36:31Z)
Generating Holistic 3D Human Motion from Speech [97.11392166257791]
We build a high-quality dataset of 3D holistic body meshes with synchronous speech. We then define a novel speech-to-motion generation framework in which the face, body, and hands are modeled separately.
arXiv Detail & Related papers (2022-12-08T17:25:19Z)
Triangular Character Animation Sampling with Motion, Emotion, and Relation [78.80083186208712]
We present a novel framework to sample and synthesize animations by associating the characters' body motions, facial expressions, and social relations. Our method can provide animators with an automatic way to generate 3D character animations, help synthesize interactions between Non-Player Characters (NPCs) and enhance machine emotion intelligence in virtual reality (VR)
arXiv Detail & Related papers (2022-03-09T18:19:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.