Generacion de voces artificiales infantiles en castellano con acento
costarricense
- URL: http://arxiv.org/abs/2102.01692v1
- Date: Tue, 2 Feb 2021 02:12:28 GMT
- Title: Generacion de voces artificiales infantiles en castellano con acento
costarricense
- Authors: Ana Lilia Alvarez-Blanco, Eugenia Cordoba-Warner, Marvin Coto-Jimenez,
Vivian Fallas-Lopez, Maribel Morales Rodriguez
- Abstract summary: This article evaluates a first experience of generating artificial children's voices with a Costa Rican accent.
Results show that the intelligibility of the results, evaluated in isolated words, is lower than the voices recorded by the group of participating children.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This article evaluates a first experience of generating artificial children's
voices with a Costa Rican accent, using the technique of statistical parametric
speech synthesis based on Hidden Markov Models. The process of recording the
voice samples used for learning the models, the fundamentals of the technique
used and the subjective evaluation of the results through the perception of a
group of people is described. The results show that the intelligibility of the
results, evaluated in isolated words, is lower than the voices recorded by the
group of participating children. Similarly, the detection of the age and gender
of the speaking person is significantly affected in artificial voices, relative
to recordings of natural voices. These results show the need to obtain larger
amounts of data, in addition to becoming a numerical reference for future
developments resulting from new data or from processes to improve results in
the same technique.
Related papers
- Speechworthy Instruction-tuned Language Models [71.8586707840169]
We show that both prompting and preference learning increase the speech-suitability of popular instruction-tuned LLMs.
We share lexical, syntactical, and qualitative analyses to showcase how each method contributes to improving the speech-suitability of generated responses.
arXiv Detail & Related papers (2024-09-23T02:34:42Z) - A comparative analysis between Conformer-Transducer, Whisper, and
wav2vec2 for improving the child speech recognition [2.965450563218781]
We show that finetuning Conformer-transducer models on child speech yields significant improvements in ASR performance on child speech.
We also show Whisper and wav2vec2 adaptation on different child speech datasets.
arXiv Detail & Related papers (2023-11-07T19:32:48Z) - Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer
Learning [3.5032870024762386]
This paper presents a novel approach that leverages the Fastpitch text-to-speech (TTS) model for generating high-quality synthetic child speech.
The approach involved finetuning a multi-speaker TTS model to work with child speech.
We conducted an objective assessment that showed a significant correlation between real and synthetic child voices.
arXiv Detail & Related papers (2023-11-07T19:31:44Z) - Continuous Modeling of the Denoising Process for Speech Enhancement
Based on Deep Learning [61.787485727134424]
We use a state variable to indicate the denoising process.
A UNet-like neural network learns to estimate every state variable sampled from the continuous denoising process.
Experimental results indicate that preserving a small amount of noise in the clean target benefits speech enhancement.
arXiv Detail & Related papers (2023-09-17T13:27:11Z) - Analysing the Impact of Audio Quality on the Use of Naturalistic
Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills.
Recent developments have enabled the use of more naturalistic training data for computational models.
It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z) - Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models [95.97506031821217]
We present a novel way of conditioning a pretrained denoising diffusion speech model to produce speech in the voice of a novel person unseen during training.
The method requires a short (3 seconds) sample from the target person, and generation is steered at inference time, without any training steps.
arXiv Detail & Related papers (2022-06-05T19:45:29Z) - Self-supervised models of audio effectively explain human cortical
responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system.
We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z) - LDNet: Unified Listener Dependent Modeling in MOS Prediction for
Synthetic Speech [67.88748572167309]
We present LDNet, a unified framework for mean opinion score (MOS) prediction.
We propose two inference methods that provide more stable results and efficient computation.
arXiv Detail & Related papers (2021-10-18T08:52:31Z) - Data augmentation using prosody and false starts to recognize non-native
children's speech [12.911954427107977]
This paper describes AaltoASR's speech recognition system for the INTERSPEECH 2020 shared task on Automatic Speech Recognition.
The task is to recognize non-native speech from children of various age groups given a limited amount of speech.
arXiv Detail & Related papers (2020-08-29T05:32:32Z) - Data-driven Detection and Analysis of the Patterns of Creaky Voice [13.829936505895692]
Creaky voice is a quality frequently used as a phrase-boundary marker.
The automatic detection and modelling of creaky voice may have implications for speech technology applications.
arXiv Detail & Related papers (2020-05-31T13:34:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.