Allophant: Cross-lingual Phoneme Recognition with Articulatory
Attributes
- URL: http://arxiv.org/abs/2306.04306v2
- Date: Wed, 16 Aug 2023 17:44:59 GMT
- Title: Allophant: Cross-lingual Phoneme Recognition with Articulatory
Attributes
- Authors: Kevin Glocker (1), Aaricia Herygers (1), Munir Georges (1 and 2) ((1)
AImotion Bavaria Technische Hochschule Ingolstadt, (2) Intel Labs Germany)
- Abstract summary: Allophant is a multilingual phoneme recognizer.
It requires only a phoneme inventory for cross-lingual transfer to a target language.
Allophoible is an extension of the PHOIBLE database.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes Allophant, a multilingual phoneme recognizer. It requires
only a phoneme inventory for cross-lingual transfer to a target language,
allowing for low-resource recognition. The architecture combines a
compositional phone embedding approach with individually supervised phonetic
attribute classifiers in a multi-task architecture. We also introduce
Allophoible, an extension of the PHOIBLE database. When combined with a
distance based mapping approach for grapheme-to-phoneme outputs, it allows us
to train on PHOIBLE inventories directly. By training and evaluating on 34
languages, we found that the addition of multi-task learning improves the
model's capability of being applied to unseen phonemes and phoneme inventories.
On supervised languages we achieve phoneme error rate improvements of 11
percentage points (pp.) compared to a baseline without multi-task learning.
Evaluation of zero-shot transfer on 84 languages yielded a decrease in PER of
2.63 pp. over the baseline.
Related papers
- Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding [55.989376102986654]
This paper studies a transferable phoneme embedding framework that aims to deal with the cross-lingual text-to-speech problem under the few-shot setting.
We propose a framework that consists of a phoneme-based TTS model and a codebook module to project phonemes from different languages into a learned latent space.
arXiv Detail & Related papers (2022-06-27T11:24:40Z) - Discovering Phonetic Inventories with Crosslingual Automatic Speech
Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language.
We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z) - Differentiable Allophone Graphs for Language-Universal Speech
Recognition [77.2981317283029]
Building language-universal speech recognition systems entails producing phonological units of spoken sound that can be shared across languages.
We present a general framework to derive phone-level supervision from only phonemic transcriptions and phone-to-phoneme mappings.
We build a universal phone-based speech recognition model with interpretable probabilistic phone-to-phoneme mappings for each language.
arXiv Detail & Related papers (2021-07-24T15:09:32Z) - That Sounds Familiar: an Analysis of Phonetic Representations Transfer
Across Languages [72.9927937955371]
We use the resources existing in other languages to train a multilingual automatic speech recognition model.
We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting.
Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages.
arXiv Detail & Related papers (2020-05-16T22:28:09Z) - Universal Phone Recognition with a Multilingual Allophone System [135.2254086165086]
We propose a joint model of language-independent phone and language-dependent phoneme distributions.
In multilingual ASR experiments over 11 languages, we find that this model improves testing performance by 2% phoneme error rate absolute.
Our recognizer achieves phone accuracy improvements of more than 17%, moving a step closer to speech recognition for all languages in the world.
arXiv Detail & Related papers (2020-02-26T21:28:57Z) - Towards Zero-shot Learning for Automatic Phonemic Transcription [82.9910512414173]
A more challenging problem is to build phonemic transcribers for languages with zero training data.
Our model is able to recognize unseen phonemes in the target language without any training data.
It achieves 7.7% better phoneme error rate on average over a standard multilingual model.
arXiv Detail & Related papers (2020-02-26T20:38:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.