Pronunciation Modeling of Foreign Words for Mandarin ASR by Considering
the Effect of Language Transfer
- URL: http://arxiv.org/abs/2210.03603v1
- Date: Fri, 7 Oct 2022 14:59:44 GMT
- Title: Pronunciation Modeling of Foreign Words for Mandarin ASR by Considering
the Effect of Language Transfer
- Authors: Lei Wang, Rong Tong
- Abstract summary: The paper focuses on examining the phonetic effect of language transfer in automatic speech recognition.
A set of lexical rules is proposed to convert an English word into Mandarin phonetic representation.
The proposed lexical rules are generalized and they can be directly applied to unseen English words.
- Score: 4.675953329876724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the challenges in automatic speech recognition is foreign words
recognition. It is observed that a speaker's pronunciation of a foreign word is
influenced by his native language knowledge, and such phenomenon is known as
the effect of language transfer. This paper focuses on examining the phonetic
effect of language transfer in automatic speech recognition. A set of lexical
rules is proposed to convert an English word into Mandarin phonetic
representation. In this way, a Mandarin lexicon can be augmented by including
English words. Hence, the Mandarin ASR system becomes capable to recognize
English words without retraining or re-estimation of the acoustic model
parameters. Using the lexicon that derived from the proposed rules, the ASR
performance of Mandarin English mixed speech is improved without harming the
accuracy of Mandarin only speech. The proposed lexical rules are generalized
and they can be directly applied to unseen English words.
Related papers
- Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech [3.812148920168377]
We propose a cascaded system consisting of speaker diarization and language identification.
Results show that the proposed system often achieves lower language classification and language diarization error rates.
At the same time does not negatively affect speech recognition on monolingual audio.
arXiv Detail & Related papers (2024-06-13T16:27:56Z) - Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
Most languages lack sufficient paired speech and text data to effectively train automatic speech recognition systems.
We propose the removal of reliance on a phoneme lexicon to develop unsupervised ASR systems.
We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec
Language Modeling [92.55131711064935]
We propose a cross-lingual neural language model, VALL-E X, for cross-lingual speech synthesis.
VALL-E X inherits strong in-context learning capabilities and can be applied for zero-shot cross-lingual text-to-speech synthesis and zero-shot speech-to-speech translation tasks.
It can generate high-quality speech in the target language via just one speech utterance in the source language as a prompt while preserving the unseen speaker's voice, emotion, and acoustic environment.
arXiv Detail & Related papers (2023-03-07T14:31:55Z) - Improve Bilingual TTS Using Dynamic Language and Phonology Embedding [10.244215079409797]
This paper builds a Mandarin-English TTS system to acquire more standard spoken English speech from a monolingual Chinese speaker.
We specially design an embedding strength modulator to capture the dynamic strength of language and phonology.
arXiv Detail & Related papers (2022-12-07T03:46:18Z) - VALUE: Understanding Dialect Disparity in NLU [50.35526025326337]
We construct rules for 11 features of African American Vernacular English (AAVE)
We recruit fluent AAVE speakers to validate each feature transformation via linguistic acceptability judgments.
Experiments show that these new dialectal features can lead to a drop in model performance.
arXiv Detail & Related papers (2022-04-06T18:30:56Z) - Improving Cross-lingual Speech Synthesis with Triplet Training Scheme [5.470211567548067]
Triplet training scheme is proposed to enhance the cross-lingual pronunciation.
The proposed method brings significant improvement in both intelligibility and naturalness of the synthesized cross-lingual speech.
arXiv Detail & Related papers (2022-02-22T08:40:43Z) - Mandarin-English Code-switching Speech Recognition with Self-supervised
Speech Representation Models [55.82292352607321]
Code-switching (CS) is common in daily conversations where more than one language is used within a sentence.
This paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS.
arXiv Detail & Related papers (2021-10-07T14:43:35Z) - Differentiable Allophone Graphs for Language-Universal Speech
Recognition [77.2981317283029]
Building language-universal speech recognition systems entails producing phonological units of spoken sound that can be shared across languages.
We present a general framework to derive phone-level supervision from only phonemic transcriptions and phone-to-phoneme mappings.
We build a universal phone-based speech recognition model with interpretable probabilistic phone-to-phoneme mappings for each language.
arXiv Detail & Related papers (2021-07-24T15:09:32Z) - Lexical Access Model for Italian -- Modeling human speech processing:
identification of words in running speech toward lexical access based on the
detection of landmarks and other acoustic cues to features [2.033475676482581]
This work aims at developing a system that imitates humans when identifying words in running speech.
We build a speech recognizer for Italian based on the principles of Stevens' model of Lexical Access.
arXiv Detail & Related papers (2021-06-24T10:54:56Z) - Non-native English lexicon creation for bilingual speech synthesis [9.533867546985887]
The intelligibility of a bilingual text-to-speech system depends on a lexicon that captures the phoneme sequence used by non-native speakers.
Due to the lack of non-native English lexicon, existing bilingual TTS systems employ native English lexicons that are widely available.
We propose a generic approach to obtain rules based on letter to phoneme alignment to map native English lexicon to their non-native version.
arXiv Detail & Related papers (2021-06-21T06:07:14Z) - UniSpeech: Unified Speech Representation Learning with Labeled and
Unlabeled Data [54.733889961024445]
We propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data.
We evaluate the effectiveness of UniSpeech for cross-lingual representation learning on public CommonVoice corpus.
arXiv Detail & Related papers (2021-01-19T12:53:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.