Universal Phone Recognition with a Multilingual Allophone System
- URL: http://arxiv.org/abs/2002.11800v1
- Date: Wed, 26 Feb 2020 21:28:57 GMT
- Title: Universal Phone Recognition with a Multilingual Allophone System
- Authors: Xinjian Li, Siddharth Dalmia, Juncheng Li, Matthew Lee, Patrick
Littell, Jiali Yao, Antonios Anastasopoulos, David R. Mortensen, Graham
Neubig, Alan W Black, Florian Metze
- Abstract summary: We propose a joint model of language-independent phone and language-dependent phoneme distributions.
In multilingual ASR experiments over 11 languages, we find that this model improves testing performance by 2% phoneme error rate absolute.
Our recognizer achieves phone accuracy improvements of more than 17%, moving a step closer to speech recognition for all languages in the world.
- Score: 135.2254086165086
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual models can improve language processing, particularly for low
resource situations, by sharing parameters across languages. Multilingual
acoustic models, however, generally ignore the difference between phonemes
(sounds that can support lexical contrasts in a particular language) and their
corresponding phones (the sounds that are actually spoken, which are language
independent). This can lead to performance degradation when combining a variety
of training languages, as identically annotated phonemes can actually
correspond to several different underlying phonetic realizations. In this work,
we propose a joint model of both language-independent phone and
language-dependent phoneme distributions. In multilingual ASR experiments over
11 languages, we find that this model improves testing performance by 2%
phoneme error rate absolute in low-resource conditions. Additionally, because
we are explicitly modeling language-independent phones, we can build a
(nearly-)universal phone recognizer that, when combined with the PHOIBLE large,
manually curated database of phone inventories, can be customized into 2,000
language dependent recognizers. Experiments on two low-resourced indigenous
languages, Inuktitut and Tusom, show that our recognizer achieves phone
accuracy improvements of more than 17%, moving a step closer to speech
recognition for all languages in the world.
Related papers
- Allophant: Cross-lingual Phoneme Recognition with Articulatory
Attributes [0.0]
Allophant is a multilingual phoneme recognizer.
It requires only a phoneme inventory for cross-lingual transfer to a target language.
Allophoible is an extension of the PHOIBLE database.
arXiv Detail & Related papers (2023-06-07T10:11:09Z) - Discovering Phonetic Inventories with Crosslingual Automatic Speech
Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language.
We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z) - Differentiable Allophone Graphs for Language-Universal Speech
Recognition [77.2981317283029]
Building language-universal speech recognition systems entails producing phonological units of spoken sound that can be shared across languages.
We present a general framework to derive phone-level supervision from only phonemic transcriptions and phone-to-phoneme mappings.
We build a universal phone-based speech recognition model with interpretable probabilistic phone-to-phoneme mappings for each language.
arXiv Detail & Related papers (2021-07-24T15:09:32Z) - Phoneme Recognition through Fine Tuning of Phonetic Representations: a
Case Study on Luhya Language Varieties [77.2347265289855]
We focus on phoneme recognition using Allosaurus, a method for multilingual recognition based on phonetic annotation.
To evaluate in a challenging real-world scenario, we curate phone recognition datasets for Bukusu and Saamia, two varieties of the Luhya language cluster of western Kenya and eastern Uganda.
We find that fine-tuning of Allosaurus, even with just 100 utterances, leads to significant improvements in phone error rates.
arXiv Detail & Related papers (2021-04-04T15:07:55Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z) - Towards Zero-shot Learning for Automatic Phonemic Transcription [82.9910512414173]
A more challenging problem is to build phonemic transcribers for languages with zero training data.
Our model is able to recognize unseen phonemes in the target language without any training data.
It achieves 7.7% better phoneme error rate on average over a standard multilingual model.
arXiv Detail & Related papers (2020-02-26T20:38:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.