Simulating Articulatory Trajectories with Phonological Feature Interpolation
- URL: http://arxiv.org/abs/2408.04363v1
- Date: Thu, 8 Aug 2024 10:51:16 GMT
- Title: Simulating Articulatory Trajectories with Phonological Feature Interpolation
- Authors: Angelo Ortiz Tandazo, Thomas Schatz, Thomas Hueber, Emmanuel Dupoux,
- Abstract summary: We investigate the forward mapping between pseudo-motor commands and articulatory trajectories.
Two phonological feature sets, based respectively on generative and articulatory phonology, are used to encode a phonetic target sequence.
We discuss the implications of our results for our understanding of the dynamics of biological motion.
- Score: 15.482738311360972
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As a first step towards a complete computational model of speech learning involving perception-production loops, we investigate the forward mapping between pseudo-motor commands and articulatory trajectories. Two phonological feature sets, based respectively on generative and articulatory phonology, are used to encode a phonetic target sequence. Different interpolation techniques are compared to generate smooth trajectories in these feature spaces, with a potential optimisation of the target value and timing to capture co-articulation effects. We report the Pearson correlation between a linear projection of the generated trajectories and articulatory data derived from a multi-speaker dataset of electromagnetic articulography (EMA) recordings. A correlation of 0.67 is obtained with an extended feature set based on generative phonology and a linear interpolation technique. We discuss the implications of our results for our understanding of the dynamics of biological motion.
Related papers
- Phononic materials with effectively scale-separated hierarchical features using interpretable machine learning [57.91994916297646]
Architected hierarchical phononic materials have sparked promise tunability of elastodynamic waves and vibrations over multiple frequency ranges.
In this article, hierarchical unit-cells are obtained, where features at each length scale result in a band gap within a targeted frequency range.
Our approach offers a flexible and efficient method for the exploration of new regions in the hierarchical design space.
arXiv Detail & Related papers (2024-08-15T21:35:06Z) - Speech Audio Synthesis from Tagged MRI and Non-Negative Matrix
Factorization via Plastic Transformer [11.91784203088159]
We develop an end-to-end deep learning framework for translating weighting maps to their corresponding audio waveforms.
Our framework is able to synthesize speech audio waveforms from weighting maps, outperforming conventional convolution and transformer models.
arXiv Detail & Related papers (2023-09-26T00:21:17Z) - Cross-modal Audio-visual Co-learning for Text-independent Speaker
Verification [55.624946113550195]
This paper proposes a cross-modal speech co-learning paradigm.
Two cross-modal boosters are introduced based on an audio-visual pseudo-siamese structure to learn the modality-transformed correlation.
Experimental results on the LRSLip3, GridLip, LomGridLip, and VoxLip datasets demonstrate that our proposed method achieves 60% and 20% average relative performance improvement.
arXiv Detail & Related papers (2023-02-22T10:06:37Z) - Synthesizing audio from tongue motion during speech using tagged MRI via
transformer [13.442093381065268]
We present an efficient deformation-decoder translation network for exploring the predictive information inherent in 4D motion fields via 2D spectrograms.
Our framework has the potential to improve our understanding of the relationship between these two modalities and inform the development of treatments for speech disorders.
arXiv Detail & Related papers (2023-02-14T17:27:55Z) - Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion [89.01668641930206]
We present a framework for modeling interactional communication in dyadic conversations.
We autoregressively output multiple possibilities of corresponding listener motion.
Our method organically captures the multimodal and non-deterministic nature of nonverbal dyadic interactions.
arXiv Detail & Related papers (2022-04-18T17:58:04Z) - Deep Neural Convolutive Matrix Factorization for Articulatory
Representation Decomposition [48.56414496900755]
This work uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores.
Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully.
arXiv Detail & Related papers (2022-04-01T14:25:19Z) - Bidirectional Interaction between Visual and Motor Generative Models
using Predictive Coding and Active Inference [68.8204255655161]
We propose a neural architecture comprising a generative model for sensory prediction, and a distinct generative model for motor trajectories.
We highlight how sequences of sensory predictions can act as rails guiding learning, control and online adaptation of motor trajectories.
arXiv Detail & Related papers (2021-04-19T09:41:31Z) - Multi-view Temporal Alignment for Non-parallel Articulatory-to-Acoustic
Speech Synthesis [59.623780036359655]
Articulatory-to-acoustic (A2A) synthesis refers to the generation of audible speech from captured movement of the speech articulators.
This technique has numerous applications, such as restoring oral communication to people who cannot longer speak due to illness or injury.
We propose a solution to this problem based on the theory of multi-view learning.
arXiv Detail & Related papers (2020-12-30T15:09:02Z) - Correlation based Multi-phasal models for improved imagined speech EEG
recognition [22.196642357767338]
This work aims to profit from the parallel information contained in multi-phasal EEG data recorded while speaking, imagining and performing articulatory movements corresponding to specific speech units.
A bi-phase common representation learning module using neural networks is designed to model the correlation and between an analysis phase and a support phase.
The proposed approach further handles the non-availability of multi-phasal data during decoding.
arXiv Detail & Related papers (2020-11-04T09:39:53Z) - Learning Joint Articulatory-Acoustic Representations with Normalizing
Flows [7.183132975698293]
We find a joint latent representation between the articulatory and acoustic domain for vowel sounds via invertible neural network models.
Our approach achieves both articulatory-to-acoustic as well as acoustic-to-articulatory mapping, thereby demonstrating our success in achieving a joint encoding of both the domains.
arXiv Detail & Related papers (2020-05-16T04:34:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.