Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for
Generating Representational Gestures from Speech
- URL: http://arxiv.org/abs/2106.14736v1
- Date: Mon, 28 Jun 2021 14:07:59 GMT
- Title: Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for
Generating Representational Gestures from Speech
- Authors: Taras Kucherenko, Rajmund Nagy, Patrik Jonell, Michael Neff, Hedvig
Kjellstr\"om, Gustav Eje Henter
- Abstract summary: We propose a new framework for gesture generation, aiming to allow data-driven approaches to produce semantically rich gestures.
Our approach first predicts whether to gesture, followed by a prediction of the gesture properties.
- Score: 9.859003149671807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new framework for gesture generation, aiming to allow
data-driven approaches to produce more semantically rich gestures. Our approach
first predicts whether to gesture, followed by a prediction of the gesture
properties. Those properties are then used as conditioning for a modern
probabilistic gesture-generation model capable of high-quality output. This
empowers the approach to generate gestures that are both diverse and
representational.
Related papers
- Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis [25.822870767380685]
We present Semantic Gesticulator, a framework designed to synthesize realistic gestures with strong semantic correspondence.
Our system demonstrates robustness in generating gestures that are rhythmically coherent and semantically explicit.
Our system outperforms state-of-the-art systems in terms of semantic appropriateness by a clear margin.
arXiv Detail & Related papers (2024-05-16T05:09:01Z) - DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation [72.85685916829321]
DiffSHEG is a Diffusion-based approach for Speech-driven Holistic 3D Expression and Gesture generation with arbitrary length.
By enabling the real-time generation of expressive and synchronized motions, DiffSHEG showcases its potential for various applications in the development of digital humans and embodied agents.
arXiv Detail & Related papers (2024-01-09T11:38:18Z) - From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations [107.88375243135579]
Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands.
We visualize the generated motion using highly photorealistic avatars that can express crucial nuances in gestures.
Experiments show our model generates appropriate and diverse gestures, outperforming both diffusion- and VQ-only methods.
arXiv Detail & Related papers (2024-01-03T18:55:16Z) - AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech
Gesture Synthesis [0.0]
We present an approach to pre-train partial gesture sequences using a generative adversarial network with a quantization pipeline.
By learning the mapping of a latent space representation as opposed to directly mapping it to a vector representation, this framework facilitates the generation of highly realistic and expressive gestures.
arXiv Detail & Related papers (2023-05-02T07:59:38Z) - Robustness of Demonstration-based Learning Under Limited Data Scenario [54.912936555876826]
Demonstration-based learning has shown great potential in stimulating pretrained language models' ability under limited data scenario.
Why such demonstrations are beneficial for the learning process remains unclear since there is no explicit alignment between the demonstrations and the predictions.
In this paper, we design pathological demonstrations by gradually removing intuitively useful information from the standard ones to take a deep dive of the robustness of demonstration-based sequence labeling.
arXiv Detail & Related papers (2022-10-19T16:15:04Z) - Single-Stream Multi-Level Alignment for Vision-Language Pretraining [103.09776737512078]
We propose a single stream model that aligns the modalities at multiple levels.
We achieve this using two novel tasks: symmetric cross-modality reconstruction and a pseudo-labeled key word prediction.
We demonstrate top performance on a set of Vision-Language downstream tasks such as zero-shot/fine-tuned image/text retrieval, referring expression, and VQA.
arXiv Detail & Related papers (2022-03-27T21:16:10Z) - Learning Hierarchical Cross-Modal Association for Co-Speech Gesture
Generation [107.10239561664496]
We propose a novel framework named Hierarchical Audio-to-Gesture (HA2G) for co-speech gesture generation.
The proposed method renders realistic co-speech gestures and outperforms previous methods in a clear margin.
arXiv Detail & Related papers (2022-03-24T16:33:29Z) - Speech Drives Templates: Co-Speech Gesture Synthesis with Learned
Templates [30.32106465591015]
Co-speech gesture generation is to synthesize a gesture sequence that not only looks real but also matches with the input speech audio.
Our method generates the movements of a complete upper body, including arms, hands, and the head.
arXiv Detail & Related papers (2021-08-18T07:53:36Z) - Multimodal analysis of the predictability of hand-gesture properties [10.332200713176768]
Embodied conversational agents benefit from being able to accompany their speech with gestures.
We investigate which gesture properties can be predicted from speech text and/or audio using contemporary deep learning.
arXiv Detail & Related papers (2021-08-12T14:16:00Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Gesticulator: A framework for semantically-aware speech-driven gesture
generation [17.284154896176553]
We present a model designed to produce arbitrary beat and semantic gestures together.
Our deep-learning based model takes both acoustic and semantic representations of speech as input, and generates gestures as a sequence of joint angle rotations as output.
The resulting gestures can be applied to both virtual agents and humanoid robots.
arXiv Detail & Related papers (2020-01-25T14:42:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.