Neural Machine Translation for Multilingual Grapheme-to-Phoneme
Conversion
- URL: http://arxiv.org/abs/2006.14194v2
- Date: Sun, 28 Jun 2020 23:36:47 GMT
- Title: Neural Machine Translation for Multilingual Grapheme-to-Phoneme
Conversion
- Authors: Alex Sokolov, Tracy Rohlin, Ariya Rastrow
- Abstract summary: We present a single end-to-end trained neural G2P model that shares same encoder and decoder across multiple languages.
We show 7.2% average improvement in phoneme error rate over low resource languages and no over high resource ones compared to monolingual baselines.
- Score: 13.543705472805431
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Grapheme-to-phoneme (G2P) models are a key component in Automatic Speech
Recognition (ASR) systems, such as the ASR system in Alexa, as they are used to
generate pronunciations for out-of-vocabulary words that do not exist in the
pronunciation lexicons (mappings like "e c h o" to "E k oU"). Most G2P systems
are monolingual and based on traditional joint-sequence based n-gram models
[1,2]. As an alternative, we present a single end-to-end trained neural G2P
model that shares same encoder and decoder across multiple languages. This
allows the model to utilize a combination of universal symbol inventories of
Latin-like alphabets and cross-linguistically shared feature representations.
Such model is especially useful in the scenarios of low resource languages and
code switching/foreign words, where the pronunciations in one language need to
be adapted to other locales or accents. We further experiment with word
language distribution vector as an additional training target in order to
improve system performance by helping the model decouple pronunciations across
a variety of languages in the parameter space. We show 7.2% average improvement
in phoneme error rate over low resource languages and no degradation over high
resource ones compared to monolingual baselines.
Related papers
- A two-stage transliteration approach to improve performance of a multilingual ASR [1.9511556030544333]
This paper presents an approach to build a language-agnostic end-to-end model trained on a grapheme set.
We performed experiments with an end-to-end multilingual speech recognition system for two Indic languages.
arXiv Detail & Related papers (2024-10-09T05:30:33Z) - Multilingual self-supervised speech representations improve the speech
recognition of low-resource African languages with codeswitching [65.74653592668743]
Finetuning self-supervised multilingual representations reduces absolute word error rates by up to 20%.
In circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.
arXiv Detail & Related papers (2023-11-25T17:05:21Z) - Multilingual context-based pronunciation learning for Text-to-Speech [13.941800219395757]
Phonetic information and linguistic knowledge are an essential component of a Text-to-speech (TTS) front-end.
We showcase a multilingual unified front-end system that addresses any pronunciation related task, typically handled by separate modules.
We find that the multilingual model is competitive across languages and tasks, however, some trade-offs exists when compared to equivalent monolingual solutions.
arXiv Detail & Related papers (2023-07-31T14:29:06Z) - Improving grapheme-to-phoneme conversion by learning pronunciations from
speech recordings [12.669655363646257]
The Grapheme-to-Phoneme (G2P) task aims to convert orthographic input into a discrete phonetic representation.
We propose a method to improve the G2P conversion task by learning pronunciation examples from audio recordings.
arXiv Detail & Related papers (2023-07-31T13:25:38Z) - Language-agnostic Code-Switching in Sequence-To-Sequence Speech
Recognition [62.997667081978825]
Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages.
We propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are transcribed.
We show that this augmentation can even improve the model's performance on inter-sentential language switches not seen during training by 5,03% WER.
arXiv Detail & Related papers (2022-10-17T12:15:57Z) - Efficient Weight factorization for Multilingual Speech Recognition [67.00151881207792]
End-to-end multilingual speech recognition involves using a single model training on a compositional speech corpus including many languages.
Due to the fact that each language in the training data has different characteristics, the shared network may struggle to optimize for all various languages simultaneously.
We propose a novel multilingual architecture that targets the core operation in neural networks: linear transformation functions.
arXiv Detail & Related papers (2021-05-07T00:12:02Z) - Acoustics Based Intent Recognition Using Discovered Phonetic Units for
Low Resource Languages [51.0542215642794]
We propose a novel acoustics based intent recognition system that uses discovered phonetic units for intent classification.
We present results for two languages families - Indic languages and Romance languages, for two different intent recognition tasks.
arXiv Detail & Related papers (2020-11-07T00:35:31Z) - Improved acoustic word embeddings for zero-resource languages using
multilingual transfer [37.78342106714364]
We train a single supervised embedding model on labelled data from multiple well-resourced languages and apply it to unseen zero-resource languages.
We consider three multilingual recurrent neural network (RNN) models: a classifier trained on the joint vocabularies of all training languages; a Siamese RNN trained to discriminate between same and different words from multiple languages; and a correspondence autoencoder (CAE) RNN trained to reconstruct word pairs.
All of these models outperform state-of-the-art unsupervised models trained on the zero-resource languages themselves, giving relative improvements of more than 30% in average precision.
arXiv Detail & Related papers (2020-06-02T12:28:34Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z) - Universal Phone Recognition with a Multilingual Allophone System [135.2254086165086]
We propose a joint model of language-independent phone and language-dependent phoneme distributions.
In multilingual ASR experiments over 11 languages, we find that this model improves testing performance by 2% phoneme error rate absolute.
Our recognizer achieves phone accuracy improvements of more than 17%, moving a step closer to speech recognition for all languages in the world.
arXiv Detail & Related papers (2020-02-26T21:28:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.