ByT5 model for massively multilingual grapheme-to-phoneme conversion
- URL: http://arxiv.org/abs/2204.03067v1
- Date: Wed, 6 Apr 2022 20:03:38 GMT
- Title: ByT5 model for massively multilingual grapheme-to-phoneme conversion
- Authors: Jian Zhu, Cong Zhang, David Jurgens
- Abstract summary: We tackle massively multilingual grapheme-to-phoneme conversion through implementing G2P models based on ByT5.
We found that ByT5 operating on byte-level inputs significantly outperformed the token-based mT5 model in terms of multilingual G2P.
- Score: 13.672109728462663
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this study, we tackle massively multilingual grapheme-to-phoneme
conversion through implementing G2P models based on ByT5. We have curated a G2P
dataset from various sources that covers around 100 languages and trained
large-scale multilingual G2P models based on ByT5. We found that ByT5 operating
on byte-level inputs significantly outperformed the token-based mT5 model in
terms of multilingual G2P. Pairwise comparison with monolingual models in these
languages suggests that multilingual ByT5 models generally lower the phone
error rate by jointly learning from a variety of languages. The pretrained
model can further benefit low resource G2P through zero-shot prediction on
unseen languages or provides pretrained weights for finetuning, which helps the
model converge to a lower phone error rate than randomly initialized weights.
To facilitate future research on multilingual G2P, we make available our code
and pretrained multilingual G2P models at:
https://github.com/lingjzhu/CharsiuG2P.
Related papers
- A Text-to-Text Model for Multilingual Offensive Language Identification [19.23565690468299]
This study presents the first pre-trained model with encoder-decoder architecture for offensive language identification with text-to-text transformers (T5)
Our pre-trained T5 model outperforms other transformer-based models fine-tuned for offensive language detection, such as fBERT and HateBERT, in multiple English benchmarks.
Following a similar approach, we also train the first multilingual pre-trained model for offensive language identification using mT5.
arXiv Detail & Related papers (2023-12-06T09:37:27Z) - idT5: Indonesian Version of Multilingual T5 Transformer [0.0]
Indonesian is spoken by almost 200 million people and is the 10th most spoken language in the world.
In this study, the mT5 model was adapted for only one language, Indonesian, resulting in a pre-trained T5 model that was specific only for Indonesian with a smaller size.
Fine-tuned model based on our model achieved 77.18% accuracy on SA, 8% higher than the mT5-based model, and obtained nearly the same score as the mT5-based model on QG and QA.
arXiv Detail & Related papers (2023-02-02T03:56:16Z) - mGPT: Few-Shot Learners Go Multilingual [1.4354798873010843]
This paper introduces two autoregressive GPT-like models with 1.3 billion and 13 billion parameters trained on 60 languages.
We reproduce the GPT-3 architecture using GPT-2 sources and the sparse attention mechanism.
The resulting models show performance on par with the recently released XGLM models by Facebook.
arXiv Detail & Related papers (2022-04-15T13:02:33Z) - PaLM: Scaling Language Modeling with Pathways [180.69584031908113]
We trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.
We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods.
We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks.
arXiv Detail & Related papers (2022-04-05T16:11:45Z) - Few-shot Learning with Multilingual Language Models [66.49496434282564]
We train multilingual autoregressive language models on a balanced corpus covering a diverse set of languages.
Our largest model sets new state of the art in few-shot learning in more than 20 representative languages.
We present a detailed analysis of where the model succeeds and fails, showing in particular that it enables cross-lingual in-context learning.
arXiv Detail & Related papers (2021-12-20T16:52:35Z) - Language Models are Few-shot Multilingual Learners [66.11011385895195]
We evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages.
We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones.
arXiv Detail & Related papers (2021-09-16T03:08:22Z) - Multilingual Translation via Grafting Pre-trained Language Models [12.787188625198459]
We propose Graformer to graft separately pre-trained (masked) language models for machine translation.
With monolingual data for pre-training and parallel data for grafting training, we maximally take advantage of the usage of both types of data.
arXiv Detail & Related papers (2021-09-11T10:57:45Z) - mT5: A massively multilingual pre-trained text-to-text transformer [60.0210636815514]
"Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on English-language NLP tasks.
We introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages.
arXiv Detail & Related papers (2020-10-22T17:58:14Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z) - Multilingual Translation with Extensible Multilingual Pretraining and
Finetuning [77.33262578776291]
Previous work has demonstrated that machine translation systems can be created by finetuning on bitext.
We show that multilingual translation models can be created through multilingual finetuning.
We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance.
arXiv Detail & Related papers (2020-08-02T05:36:55Z) - One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme
Conversion With a Transformer Ensemble [0.0]
We describe a simple approach of exploiting model ensembles, based on multilingual Transformers and self-training, to develop a highly effective G2P solution for 15 languages.
Our best models achieve 14.99 word error rate (WER) and 3.30 phoneme error rate (PER), a sizeable improvement over the shared task competitive baselines.
arXiv Detail & Related papers (2020-06-23T21:28:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.