Phonetically rich corpus construction for a low-resourced language
- URL: http://arxiv.org/abs/2402.05794v1
- Date: Thu, 8 Feb 2024 16:36:11 GMT
- Title: Phonetically rich corpus construction for a low-resourced language
- Authors: Marcellus Amadeus and William Alberto Cruz Casta\~neda and Wilmer
Lobato and Niasche Aquino
- Abstract summary: This paper proposes a novel approach to create a textitcorpus with broad phonetic coverage for a low-resourced language.
Our methodology includes text dataset collection up to a sentence selection algorithm based on triphone distribution.
Using our algorithm, we achieve a 55.8% higher percentage of distinct triphones -- for samples of similar size.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speech technologies rely on capturing a speaker's voice variability while
obtaining comprehensive language information. Textual prompts and sentence
selection methods have been proposed in the literature to comprise such
adequate phonetic data, referred to as a phonetically rich \textit{corpus}.
However, they are still insufficient for acoustic modeling, especially critical
for languages with limited resources. Hence, this paper proposes a novel
approach and outlines the methodological aspects required to create a
\textit{corpus} with broad phonetic coverage for a low-resourced language,
Brazilian Portuguese. Our methodology includes text dataset collection up to a
sentence selection algorithm based on triphone distribution. Furthermore, we
propose a new phonemic classification according to acoustic-articulatory speech
features since the absolute number of distinct triphones, or low-probability
triphones, does not guarantee an adequate representation of every possible
combination. Using our algorithm, we achieve a 55.8\% higher percentage of
distinct triphones -- for samples of similar size -- while the currently
available phonetic-rich corpus, CETUC and TTS-Portuguese, 12.6\% and 12.3\% in
comparison to a non-phonetically rich dataset.
Related papers
- Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models [83.7506131809624]
We introduce an approach to identifying speaker names in dialogue transcripts, a crucial task for enhancing content accessibility and searchability in digital media archives.
We present a novel, large-scale dataset derived from the MediaSum corpus, encompassing transcripts from a wide range of media sources.
We propose novel transformer-based models tailored for SpeakerID, leveraging contextual cues within dialogues to accurately attribute speaker names.
arXiv Detail & Related papers (2024-07-16T18:03:58Z) - Controllable Emphasis with zero data for text-to-speech [57.12383531339368]
A simple but effective method to achieve emphasized speech consists in increasing the predicted duration of the emphasised word.
We show that this is significantly better than spectrogram modification techniques improving naturalness by $7.3%$ and correct testers' identification of the emphasised word in a sentence by $40%$ on a reference female en-US voice.
arXiv Detail & Related papers (2023-07-13T21:06:23Z) - Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding [55.989376102986654]
This paper studies a transferable phoneme embedding framework that aims to deal with the cross-lingual text-to-speech problem under the few-shot setting.
We propose a framework that consists of a phoneme-based TTS model and a codebook module to project phonemes from different languages into a learned latent space.
arXiv Detail & Related papers (2022-06-27T11:24:40Z) - Distribution augmentation for low-resource expressive text-to-speech [18.553812159109253]
This paper presents a novel data augmentation technique for text-to-speech (TTS)
It allows to generate new (text, audio) training examples without requiring any additional data.
arXiv Detail & Related papers (2022-02-13T21:19:31Z) - Cross-lingual Low Resource Speaker Adaptation Using Phonological
Features [2.8080708404213373]
We train a language-agnostic multispeaker model conditioned on a set of phonologically derived features common across different languages.
With as few as 32 and 8 utterances of target speaker data, we obtain high speaker similarity scores and naturalness comparable to the corresponding literature.
arXiv Detail & Related papers (2021-11-17T12:33:42Z) - Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data [11.18504333789534]
We propose to use low-quality code-switched found data from the non-target speakers to achieve cross-lingual voice cloning for the target speakers.
Experiments show that our proposed method can generate high-quality code-switched speech in the target voices in terms of both naturalness and speaker consistency.
arXiv Detail & Related papers (2021-10-14T08:16:06Z) - Multilingual Byte2Speech Text-To-Speech Models Are Few-shot Spoken
Language Learners [11.190877290770047]
We present a multilingual end-to-end Text-To-Speech framework that maps byte inputs to spectrograms, thus allowing arbitrary input scripts.
The framework demonstrates capabilities to adapt to various new languages under extreme low-resource scenarios.
We propose a novel method to extract language-specific sub-networks for a better understanding of the mechanism of multilingual models.
arXiv Detail & Related papers (2021-03-05T08:41:45Z) - A Corpus for Large-Scale Phonetic Typology [112.19288631037055]
We present VoxClamantis v1.0, the first large-scale corpus for phonetic typology.
aligned segments and estimated phoneme-level labels in 690 readings spanning 635 languages, along with acoustic-phonetic measures of vowels and sibilants.
arXiv Detail & Related papers (2020-05-28T13:03:51Z) - Unsupervised Cross-Modal Audio Representation Learning from Unstructured
Multilingual Text [69.55642178336953]
We present an approach to unsupervised audio representation learning.
Based on a triplet neural network architecture, we harnesses semantically related cross-modal information to estimate audio track-relatedness.
We show that our approach is invariant to the variety of annotation styles as well as to the different languages of this collection.
arXiv Detail & Related papers (2020-03-27T07:37:15Z) - Continuous speech separation: dataset and analysis [52.10378896407332]
In natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components.
This paper describes a dataset and protocols for evaluating continuous speech separation algorithms.
arXiv Detail & Related papers (2020-01-30T18:01:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.