Non-native English lexicon creation for bilingual speech synthesis
- URL: http://arxiv.org/abs/2106.10870v1
- Date: Mon, 21 Jun 2021 06:07:14 GMT
- Title: Non-native English lexicon creation for bilingual speech synthesis
- Authors: Arun Baby, Pranav Jawale, Saranya Vinnaitherthan, Sumukh Badam,
Nagaraj Adiga, Sharath Adavanne
- Abstract summary: The intelligibility of a bilingual text-to-speech system depends on a lexicon that captures the phoneme sequence used by non-native speakers.
Due to the lack of non-native English lexicon, existing bilingual TTS systems employ native English lexicons that are widely available.
We propose a generic approach to obtain rules based on letter to phoneme alignment to map native English lexicon to their non-native version.
- Score: 9.533867546985887
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bilingual English speakers speak English as one of their languages. Their
English is of a non-native kind, and their conversations are of a code-mixed
fashion. The intelligibility of a bilingual text-to-speech (TTS) system for
such non-native English speakers depends on a lexicon that captures the phoneme
sequence used by non-native speakers. However, due to the lack of non-native
English lexicon, existing bilingual TTS systems employ native English lexicons
that are widely available, in addition to their native language lexicon. Due to
the inconsistency between the non-native English pronunciation in the audio and
native English lexicon in the text, the intelligibility of synthesized speech
in such TTS systems is significantly reduced.
This paper is motivated by the knowledge that the native language of the
speaker highly influences non-native English pronunciation. We propose a
generic approach to obtain rules based on letter to phoneme alignment to map
native English lexicon to their non-native version. The effectiveness of such
mapping is studied by comparing bilingual (Indian English and Hindi) TTS
systems trained with and without the proposed rules. The subjective evaluation
shows that the bilingual TTS system trained with the proposed non-native
English lexicon rules obtains a 6% absolute improvement in preference.
Related papers
- Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS [52.89324095217975]
Previous approaches on accent conversion mainly aimed at making non-native speech sound more native.
We develop a new AC approach that not only focuses on accent conversion but also improves pronunciation of non-native accented speaker.
arXiv Detail & Related papers (2024-10-19T06:12:31Z) - An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios [76.11409260727459]
This paper explores the language adaptation capability of ZMM-TTS, a recent SSL-based multilingual TTS system.
We demonstrate that the similarity in phonetics between the pre-training and target languages, as well as the language category, affects the target language's adaptation performance.
arXiv Detail & Related papers (2024-06-13T08:16:52Z) - Hindi as a Second Language: Improving Visually Grounded Speech with
Semantically Similar Samples [89.16814518860357]
The objective of this work is to explore the learning of visually grounded speech models (VGS) from multilingual perspective.
Our key contribution in this work is to leverage the power of a high-resource language in a bilingual visually grounded speech model to improve the performance of a low-resource language.
arXiv Detail & Related papers (2023-03-30T16:34:10Z) - Multi-VALUE: A Framework for Cross-Dialectal English NLP [49.55176102659081]
Multi- Dialect is a controllable rule-based translation system spanning 50 English dialects.
Stress tests reveal significant performance disparities for leading models on non-standard dialects.
We partner with native speakers of Chicano and Indian English to release new gold-standard variants of the popular CoQA task.
arXiv Detail & Related papers (2022-12-15T18:17:01Z) - Improve Bilingual TTS Using Dynamic Language and Phonology Embedding [10.244215079409797]
This paper builds a Mandarin-English TTS system to acquire more standard spoken English speech from a monolingual Chinese speaker.
We specially design an embedding strength modulator to capture the dynamic strength of language and phonology.
arXiv Detail & Related papers (2022-12-07T03:46:18Z) - Pronunciation Modeling of Foreign Words for Mandarin ASR by Considering
the Effect of Language Transfer [4.675953329876724]
The paper focuses on examining the phonetic effect of language transfer in automatic speech recognition.
A set of lexical rules is proposed to convert an English word into Mandarin phonetic representation.
The proposed lexical rules are generalized and they can be directly applied to unseen English words.
arXiv Detail & Related papers (2022-10-07T14:59:44Z) - Improving Cross-lingual Speech Synthesis with Triplet Training Scheme [5.470211567548067]
Triplet training scheme is proposed to enhance the cross-lingual pronunciation.
The proposed method brings significant improvement in both intelligibility and naturalness of the synthesized cross-lingual speech.
arXiv Detail & Related papers (2022-02-22T08:40:43Z) - Mandarin-English Code-switching Speech Recognition with Self-supervised
Speech Representation Models [55.82292352607321]
Code-switching (CS) is common in daily conversations where more than one language is used within a sentence.
This paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS.
arXiv Detail & Related papers (2021-10-07T14:43:35Z) - Towards Natural Bilingual and Code-Switched Speech Synthesis Based on
Mix of Monolingual Recordings and Cross-Lingual Voice Conversion [28.830575877307176]
It is not easy to obtain a bilingual corpus from a speaker who achieves native-level fluency in both languages.
A Tacotron2-based cross-lingual voice conversion system is employed to generate the Mandarin speaker's English speech and the English speaker's Mandarin speech.
The obtained bilingual data are then augmented with code-switched utterances synthesized using a Transformer model.
arXiv Detail & Related papers (2020-10-16T03:51:00Z) - Latent linguistic embedding for cross-lingual text-to-speech and voice
conversion [44.700803634034486]
Cross-lingual speech generation is the scenario in which speech utterances are generated with the voices of target speakers in a language not spoken by them originally.
We show that our method not only creates cross-lingual VC with high speaker similarity but also can be seamlessly used for cross-lingual TTS without having to perform any extra steps.
arXiv Detail & Related papers (2020-10-08T01:25:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.