Audio2Rig: Artist-oriented deep learning tool for facial animation
- URL: http://arxiv.org/abs/2405.20412v1
- Date: Thu, 30 May 2024 18:37:21 GMT
- Title: Audio2Rig: Artist-oriented deep learning tool for facial animation
- Authors: Bastien Arcelin, Nicolas Chaverou,
- Abstract summary: Audio2Rig is a new deep learning tool leveraging previously animated sequences of a show, to generate facial and lip sync rig animation from an audio file.
Based in Maya, it learns from any production rig without any adjustment and generates high quality and stylized animations.
Our method shows excellent results, generating fine animation details while respecting the show style.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Creating realistic or stylized facial and lip sync animation is a tedious task. It requires lot of time and skills to sync the lips with audio and convey the right emotion to the character's face. To allow animators to spend more time on the artistic and creative part of the animation, we present Audio2Rig: a new deep learning based tool leveraging previously animated sequences of a show, to generate facial and lip sync rig animation from an audio file. Based in Maya, it learns from any production rig without any adjustment and generates high quality and stylized animations which mimic the style of the show. Audio2Rig fits in the animator workflow: since it generates keys on the rig controllers, the animation can be easily retaken. The method is based on 3 neural network modules which can learn an arbitrary number of controllers. Hence, different configurations can be created for specific parts of the face (such as the tongue, lips or eyes). With Audio2Rig, animators can also pick different emotions and adjust their intensities to experiment or customize the output, and have high level controls on the keyframes setting. Our method shows excellent results, generating fine animation details while respecting the show style. Finally, as the training relies on the studio data and is done internally, it ensures data privacy and prevents from copyright infringement.
Related papers
- Zero-shot High-fidelity and Pose-controllable Character Animation [89.74818983864832]
Image-to-video (I2V) generation aims to create a video sequence from a single image.
Existing approaches suffer from inconsistency of character appearances and poor preservation of fine details.
We propose PoseAnimate, a novel zero-shot I2V framework for character animation.
arXiv Detail & Related papers (2024-04-21T14:43:31Z) - AnimateZero: Video Diffusion Models are Zero-Shot Image Animators [63.938509879469024]
We propose AnimateZero to unveil the pre-trained text-to-video diffusion model, i.e., AnimateDiff.
For appearance control, we borrow intermediate latents and their features from the text-to-image (T2I) generation.
For temporal control, we replace the global temporal attention of the original T2V model with our proposed positional-corrected window attention.
arXiv Detail & Related papers (2023-12-06T13:39:35Z) - 3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing [22.30870274645442]
We present 3DiFACE, a novel method for personalized speech-driven 3D facial animation and editing.
Our method outperforms existing state-of-the-art techniques and yields speech-driven animations with greater fidelity and diversity.
arXiv Detail & Related papers (2023-12-01T19:01:05Z) - Breathing Life Into Sketches Using Text-to-Video Priors [101.8236605955899]
A sketch is one of the most intuitive and versatile tools humans use to convey their ideas visually.
In this work, we present a method that automatically adds motion to a single-subject sketch.
The output is a short animation provided in vector representation, which can be easily edited.
arXiv Detail & Related papers (2023-11-21T18:09:30Z) - Audio-Driven Talking Face Generation with Diverse yet Realistic Facial
Animations [61.65012981435094]
DIRFA is a novel method that can generate talking faces with diverse yet realistic facial animations from the same driving audio.
To accommodate fair variation of plausible facial animations for the same audio, we design a transformer-based probabilistic mapping network.
We show that DIRFA can generate talking faces with realistic facial animations effectively.
arXiv Detail & Related papers (2023-04-18T12:36:15Z) - FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation
Synthesis Using Self-Supervised Speech Representation Learning [0.0]
FaceXHuBERT is a text-less speech-driven 3D facial animation generation method.
It is very robust to background noise and can handle audio recorded in a variety of situations.
It produces superior results with respect to the realism of the animation 78% of the time.
arXiv Detail & Related papers (2023-03-09T17:05:19Z) - Learning Audio-Driven Viseme Dynamics for 3D Face Animation [17.626644507523963]
We present a novel audio-driven facial animation approach that can generate realistic lip-synchronized 3D animations from the input audio.
Our approach learns viseme dynamics from speech videos, produces animator-friendly viseme curves, and supports multilingual speech inputs.
arXiv Detail & Related papers (2023-01-15T09:55:46Z) - SketchBetween: Video-to-Video Synthesis for Sprite Animation via
Sketches [0.9645196221785693]
2D animation is a common factor in game development, used for characters, effects and background art.
Automated animation approaches exist, but are designed without animators in mind.
We propose a problem formulation that adheres more closely to the standard workflow of animation.
arXiv Detail & Related papers (2022-09-01T02:43:19Z) - MeshTalk: 3D Face Animation from Speech using Cross-Modality
Disentanglement [142.9900055577252]
We propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face.
Our approach ensures highly accurate lip motion, while also plausible animation of the parts of the face that are uncorrelated to the audio signal, such as eye blinks and eye brow motion.
arXiv Detail & Related papers (2021-04-16T17:05:40Z) - Self-Supervised Equivariant Scene Synthesis from Video [84.15595573718925]
We propose a framework to learn scene representations from video that are automatically delineated into background, characters, and animations.
After training, we can manipulate image encodings in real time to create unseen combinations of the delineated components.
We demonstrate results on three datasets: Moving MNIST with backgrounds, 2D video game sprites, and Fashion Modeling.
arXiv Detail & Related papers (2021-02-01T14:17:31Z) - A Robust Interactive Facial Animation Editing System [0.0]
We propose a new learning-based approach to easily edit a facial animation from a set of intuitive control parameters.
We use a resolution-preserving fully convolutional neural network that maps control parameters to blendshapes coefficients sequences.
The proposed system is robust and can handle coarse, exaggerated edits from non-specialist users.
arXiv Detail & Related papers (2020-07-18T08:31:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.