One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme
Conversion With a Transformer Ensemble
- URL: http://arxiv.org/abs/2006.13343v1
- Date: Tue, 23 Jun 2020 21:28:28 GMT
- Title: One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme
Conversion With a Transformer Ensemble
- Authors: Kaili Vesik (1), Muhammad Abdul-Mageed (1), Miikka Silfverberg (1)
((1) The University of British Columbia)
- Abstract summary: We describe a simple approach of exploiting model ensembles, based on multilingual Transformers and self-training, to develop a highly effective G2P solution for 15 languages.
Our best models achieve 14.99 word error rate (WER) and 3.30 phoneme error rate (PER), a sizeable improvement over the shared task competitive baselines.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of grapheme-to-phoneme (G2P) conversion is important for both speech
recognition and synthesis. Similar to other speech and language processing
tasks, in a scenario where only small-sized training data are available,
learning G2P models is challenging. We describe a simple approach of exploiting
model ensembles, based on multilingual Transformers and self-training, to
develop a highly effective G2P solution for 15 languages. Our models are
developed as part of our participation in the SIGMORPHON 2020 Shared Task 1
focused at G2P. Our best models achieve 14.99 word error rate (WER) and 3.30
phoneme error rate (PER), a sizeable improvement over the shared task
competitive baselines.
Related papers
- MiniGPT-v2: large language model as a unified interface for
vision-language multi-task learning [65.60607895153692]
MiniGPT-v2 is a model that can be treated as a unified interface for better handling various vision-language tasks.
We propose using unique identifiers for different tasks when training the model.
Our results show that MiniGPT-v2 achieves strong performance on many visual question-answering and visual grounding benchmarks.
arXiv Detail & Related papers (2023-10-14T03:22:07Z) - Improving grapheme-to-phoneme conversion by learning pronunciations from
speech recordings [12.669655363646257]
The Grapheme-to-Phoneme (G2P) task aims to convert orthographic input into a discrete phonetic representation.
We propose a method to improve the G2P conversion task by learning pronunciation examples from audio recordings.
arXiv Detail & Related papers (2023-07-31T13:25:38Z) - BatGPT: A Bidirectional Autoregessive Talker from Generative Pre-trained
Transformer [77.28871523946418]
BatGPT is a large-scale language model designed and trained jointly by Wuhan University and Shanghai Jiao Tong University.
It is capable of generating highly natural and fluent text in response to various types of input, including text prompts, images, and audio.
arXiv Detail & Related papers (2023-07-01T15:10:01Z) - ByT5 model for massively multilingual grapheme-to-phoneme conversion [13.672109728462663]
We tackle massively multilingual grapheme-to-phoneme conversion through implementing G2P models based on ByT5.
We found that ByT5 operating on byte-level inputs significantly outperformed the token-based mT5 model in terms of multilingual G2P.
arXiv Detail & Related papers (2022-04-06T20:03:38Z) - r-G2P: Evaluating and Enhancing Robustness of Grapheme to Phoneme
Conversion by Controlled noise introducing and Contextual information
incorporation [32.75866643254402]
We show that neural G2P models are extremely sensitive to orthographical variations in graphemes like spelling mistakes.
We propose three controlled noise introducing methods to synthesize noisy training data.
We incorporate the contextual information with the baseline and propose a robust training strategy to stabilize the training process.
arXiv Detail & Related papers (2022-02-21T13:29:30Z) - Neural Grapheme-to-Phoneme Conversion with Pre-trained Grapheme Models [35.60380484684335]
This paper proposes a pre-trained grapheme model called grapheme BERT (GBERT)
GBERT is built by self-supervised training on a large, language-specific word list with only grapheme information.
Two approaches are developed to incorporate GBERT into the state-of-the-art Transformer-based G2P model.
arXiv Detail & Related papers (2022-01-26T02:49:56Z) - GLaM: Efficient Scaling of Language Models with Mixture-of-Experts [84.33607245023049]
We propose and develop a family of language models named GLaM (Generalist Language Model)
GLaM uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants.
It consumes only 1/3 of the energy used to train GPT-3 and requires half of the flops for inference, while still achieving better overall zero-shot and one-shot performance across 29 NLP tasks.
arXiv Detail & Related papers (2021-12-13T18:58:19Z) - Grapheme-to-Phoneme Transformer Model for Transfer Learning Dialects [1.3786433185027864]
Grapheme-to-Phoneme (G2P) models convert words to their phonetic pronunciations.
Usually, dictionary-based methods require significant manual effort to build, and have limited adaptivity on unseen words.
We propose a novel use of transformer-based attention model that can adapt to unseen dialects of English language, while using a small dictionary.
arXiv Detail & Related papers (2021-04-08T21:36:21Z) - Language Models as Few-Shot Learner for Task-Oriented Dialogue Systems [74.8759568242933]
Task-oriented dialogue systems use four connected modules, namely, Natural Language Understanding (NLU), a Dialogue State Tracking (DST), Dialogue Policy (DP) and Natural Language Generation (NLG)
A research challenge is to learn each module with the least amount of samples given the high cost related to the data collection.
We evaluate the priming few-shot ability of language models in the NLU, DP and NLG tasks.
arXiv Detail & Related papers (2020-08-14T08:23:21Z) - Language Models are Few-Shot Learners [61.36677350504291]
We show that scaling up language models greatly improves task-agnostic, few-shot performance.
We train GPT-3, an autoregressive language model with 175 billion parameters, and test its performance in the few-shot setting.
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.
arXiv Detail & Related papers (2020-05-28T17:29:03Z) - Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space [109.79957125584252]
Variational Autoencoder (VAE) can be both a powerful generative model and an effective representation learning framework for natural language.
In this paper, we propose the first large-scale language VAE model, Optimus.
arXiv Detail & Related papers (2020-04-05T06:20:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.