Autoregressive GAN for Semantic Unconditional Head Motion Generation
- URL: http://arxiv.org/abs/2211.00987v2
- Date: Mon, 17 Apr 2023 09:45:22 GMT
- Title: Autoregressive GAN for Semantic Unconditional Head Motion Generation
- Authors: Louis Airale (M-PSI, ROBOTLEARN), Xavier Alameda-Pineda (ROBOTLEARN),
St\'ephane Lathuili\`ere (IP Paris, IDS, MM), Dominique Vaufreydaz (M-PSI)
- Abstract summary: We devise a GAN-based architecture that learns to synthesize rich head motion sequences over long duration while maintaining low error accumulation levels.
We experimentally demonstrate the relevance of the proposed method and show its superiority compared to models that attained state-of-the-art performances on similar tasks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we address the task of unconditional head motion generation to
animate still human faces in a low-dimensional semantic space from a single
reference pose. Different from traditional audio-conditioned talking head
generation that seldom puts emphasis on realistic head motions, we devise a
GAN-based architecture that learns to synthesize rich head motion sequences
over long duration while maintaining low error accumulation levels.In
particular, the autoregressive generation of incremental outputs ensures smooth
trajectories, while a multi-scale discriminator on input pairs drives
generation toward better handling of high- and low-frequency signals and less
mode collapse.We experimentally demonstrate the relevance of the proposed
method and show its superiority compared to models that attained
state-of-the-art performances on similar tasks.
Related papers
- Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation [22.159117464397806]
We introduce a two-stage diffusion-based model for talking head generation.
The first stage involves generating synchronized facial landmarks based on the given speech.
In the second stage, these generated landmarks serve as a condition in the denoising process, aiming to optimize mouth jitter issues and generate high-fidelity, well-synchronized, and temporally coherent talking head videos.
arXiv Detail & Related papers (2024-08-03T10:19:38Z) - MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation [29.620451579580763]
We propose a novel motion-disentangled diffusion model for talking head generation, dubbed MoDiTalker.
We introduce the two modules: audio-to-motion (AToM), designed to generate a synchronized lip motion from audio, and motion-to-video (MToV), designed to produce high-quality head video following the generated motion.
Our experiments conducted on standard benchmarks demonstrate that our model achieves superior performance compared to existing models.
arXiv Detail & Related papers (2024-03-28T04:35:42Z) - DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation [72.85685916829321]
DiffSHEG is a Diffusion-based approach for Speech-driven Holistic 3D Expression and Gesture generation with arbitrary length.
By enabling the real-time generation of expressive and synchronized motions, DiffSHEG showcases its potential for various applications in the development of digital humans and embodied agents.
arXiv Detail & Related papers (2024-01-09T11:38:18Z) - FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models [85.16273912625022]
We introduce FaceTalk, a novel generative approach designed for synthesizing high-fidelity 3D motion sequences of talking human heads from audio signal.
To the best of our knowledge, this is the first work to propose a generative approach for realistic and high-quality motion synthesis of human heads.
arXiv Detail & Related papers (2023-12-13T19:01:07Z) - StyleInV: A Temporal Style Modulated Inversion Network for Unconditional
Video Generation [73.54398908446906]
We introduce a novel motion generator design that uses a learning-based inversion network for GAN.
Our method supports style transfer with simple fine-tuning when the encoder is paired with a pretrained StyleGAN generator.
arXiv Detail & Related papers (2023-08-31T17:59:33Z) - A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony
in Talking Head Generation [0.0]
We propose a multi-scale audio-visual synchrony loss and a multi-scale autoregressive GAN to better handle short and long-term correlation between speech and head motion.
Our generator operates in the facial landmark domain, which is a standard low-dimensional head representation.
arXiv Detail & Related papers (2023-07-04T08:29:59Z) - BoDiffusion: Diffusing Sparse Observations for Full-Body Human Motion
Synthesis [14.331548412833513]
Mixed reality applications require tracking the user's full-body motion to enable an immersive experience.
We propose BoDiffusion -- a generative diffusion model for motion synthesis to tackle this under-constrained reconstruction problem.
We present a time and space conditioning scheme that allows BoDiffusion to leverage sparse tracking inputs while generating smooth and realistic full-body motion sequences.
arXiv Detail & Related papers (2023-04-21T16:39:05Z) - GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face
Synthesis [62.297513028116576]
GeneFace is a general and high-fidelity NeRF-based talking face generation method.
A head-aware torso-NeRF is proposed to eliminate the head-torso problem.
arXiv Detail & Related papers (2023-01-31T05:56:06Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - DFA-NeRF: Personalized Talking Head Generation via Disentangled Face
Attributes Neural Rendering [69.9557427451339]
We propose a framework based on neural radiance field to pursue high-fidelity talking head generation.
Specifically, neural radiance field takes lip movements features and personalized attributes as two disentangled conditions.
We show that our method achieves significantly better results than state-of-the-art methods.
arXiv Detail & Related papers (2022-01-03T18:23:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.