Exploration strategies for articulatory synthesis of complex syllable
onsets
- URL: http://arxiv.org/abs/2204.09381v1
- Date: Wed, 20 Apr 2022 10:47:28 GMT
- Title: Exploration strategies for articulatory synthesis of complex syllable
onsets
- Authors: Daniel R. van Niekerk, Anqi Xu, Branislav Gerazov, Paul K. Krug, Peter
Birkholz, Yi Xu
- Abstract summary: High-quality articulatory speech synthesis has many potential applications in speech science and technology.
We construct an optimisation-based framework as a first step towards learning these mappings without manual intervention.
- Score: 20.422871314256266
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-quality articulatory speech synthesis has many potential applications in
speech science and technology. However, developing appropriate mappings from
linguistic specification to articulatory gestures is difficult and time
consuming. In this paper we construct an optimisation-based framework as a
first step towards learning these mappings without manual intervention. We
demonstrate the production of syllables with complex onsets and discuss the
quality of the articulatory gestures with reference to coarticulation.
Related papers
- Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis [25.822870767380685]
We present Semantic Gesticulator, a framework designed to synthesize realistic gestures with strong semantic correspondence.
Our system demonstrates robustness in generating gestures that are rhythmically coherent and semantically explicit.
Our system outperforms state-of-the-art systems in terms of semantic appropriateness by a clear margin.
arXiv Detail & Related papers (2024-05-16T05:09:01Z) - Expressivity and Speech Synthesis [51.75420054449122]
We outline the methodological advances that brought us so far and sketch out the ongoing efforts to reach that coveted next level of artificial expressivity.
We also discuss the societal implications coupled with rapidly advancing expressive speech synthesis (ESS) technology.
arXiv Detail & Related papers (2024-04-30T08:47:24Z) - Unified speech and gesture synthesis using flow matching [24.2094371314481]
This paper presents a novel, unified architecture for jointly synthesising speech acoustics and skeleton-based 3D gesture motion from text.
The proposed architecture is simpler than the previous state of the art, has a smaller memory footprint, and can capture the joint distribution of speech and gestures.
arXiv Detail & Related papers (2023-10-08T14:37:28Z) - EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech
Resynthesis [49.04496602282718]
We introduce Expresso, a high-quality expressive speech dataset for textless speech synthesis.
This dataset includes both read speech and improvised dialogues rendered in 26 spontaneous expressive styles.
We evaluate resynthesis quality with automatic metrics for different self-supervised discrete encoders.
arXiv Detail & Related papers (2023-08-10T17:41:19Z) - Diff-TTSG: Denoising probabilistic integrated speech and gesture
synthesis [19.35266496960533]
We present the first diffusion-based probabilistic model, called Diff-TTSG, that jointly learns to synthesise speech and gestures together.
We describe a set of careful uni- and multi-modal subjective tests for evaluating integrated speech and gesture synthesis systems.
arXiv Detail & Related papers (2023-06-15T18:02:49Z) - Revisiting Conversation Discourse for Dialogue Disentanglement [88.3386821205896]
We propose enhancing dialogue disentanglement by taking full advantage of the dialogue discourse characteristics.
We develop a structure-aware framework to integrate the rich structural features for better modeling the conversational semantic context.
Our work has great potential to facilitate broader multi-party multi-thread dialogue applications.
arXiv Detail & Related papers (2023-06-06T19:17:47Z) - Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with
Hierarchical Neural Embeddings [27.352570417976153]
We present a novel co-speech gesture synthesis method that achieves convincing results both on the rhythm and semantics.
For the rhythm, our system contains a robust rhythm-based segmentation pipeline to ensure the temporal coherence between the vocalization and gestures explicitly.
For the gesture semantics, we devise a mechanism to effectively disentangle both low- and high-level neural embeddings of speech and motion based on linguistic theory.
arXiv Detail & Related papers (2022-10-04T08:19:06Z) - Learning Hierarchical Cross-Modal Association for Co-Speech Gesture
Generation [107.10239561664496]
We propose a novel framework named Hierarchical Audio-to-Gesture (HA2G) for co-speech gesture generation.
The proposed method renders realistic co-speech gestures and outperforms previous methods in a clear margin.
arXiv Detail & Related papers (2022-03-24T16:33:29Z) - Towards Expressive Speaking Style Modelling with Hierarchical Context
Information for Mandarin Speech Synthesis [37.93814851450597]
We propose a hierarchical framework to model speaking style from context.
A hierarchical context encoder is proposed to explore a wider range of contextual information.
To encourage this encoder to learn style representation better, we introduce a novel training strategy.
arXiv Detail & Related papers (2022-03-23T05:27:57Z) - Towards Multi-Scale Style Control for Expressive Speech Synthesis [60.08928435252417]
The proposed method employs a multi-scale reference encoder to extract both the global-scale utterance-level and the local-scale quasi-phoneme-level style features of the target speech.
During training time, the multi-scale style model could be jointly trained with the speech synthesis model in an end-to-end fashion.
arXiv Detail & Related papers (2021-04-08T05:50:09Z) - Multi-view Temporal Alignment for Non-parallel Articulatory-to-Acoustic
Speech Synthesis [59.623780036359655]
Articulatory-to-acoustic (A2A) synthesis refers to the generation of audible speech from captured movement of the speech articulators.
This technique has numerous applications, such as restoring oral communication to people who cannot longer speak due to illness or injury.
We propose a solution to this problem based on the theory of multi-view learning.
arXiv Detail & Related papers (2020-12-30T15:09:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.