Transformers for Low-Resource Languages:Is F\'eidir Linn!
- URL: http://arxiv.org/abs/2403.01985v1
- Date: Mon, 4 Mar 2024 12:29:59 GMT
- Title: Transformers for Low-Resource Languages:Is F\'eidir Linn!
- Authors: S\'eamus Lankford, Haithem Afli and Andy Way
- Abstract summary: In general, neural translation models often under perform on language pairs with insufficient training data.
We demonstrate that choosing appropriate parameters leads to considerable performance improvements.
A Transformer optimized model demonstrated a BLEU score improvement of 7.8 points when compared with a baseline RNN model.
- Score: 2.648836772989769
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Transformer model is the state-of-the-art in Machine Translation.
However, in general, neural translation models often under perform on language
pairs with insufficient training data. As a consequence, relatively few
experiments have been carried out using this architecture on low-resource
language pairs. In this study, hyperparameter optimization of Transformer
models in translating the low-resource English-Irish language pair is
evaluated. We demonstrate that choosing appropriate parameters leads to
considerable performance improvements. Most importantly, the correct choice of
subword model is shown to be the biggest driver of translation performance.
SentencePiece models using both unigram and BPE approaches were appraised.
Variations on model architectures included modifying the number of layers,
testing various regularisation techniques and evaluating the optimal number of
heads for attention. A generic 55k DGT corpus and an in-domain 88k public admin
corpus were used for evaluation. A Transformer optimized model demonstrated a
BLEU score improvement of 7.8 points when compared with a baseline RNN model.
Improvements were observed across a range of metrics, including TER, indicating
a substantially reduced post editing effort for Transformer optimized models
with 16k BPE subword models. Bench-marked against Google Translate, our
translation engines demonstrated significant improvements. The question of
whether or not Transformers can be used effectively in a low-resource setting
of English-Irish translation has been addressed. Is f\'eidir linn - yes we can.
Related papers
- Low-resource neural machine translation with morphological modeling [3.3721926640077804]
Morphological modeling in neural machine translation (NMT) is a promising approach to achieving open-vocabulary machine translation.
We propose a framework-solution for modeling complex morphology in low-resource settings.
We evaluate our proposed solution on Kinyarwanda - English translation using public-domain parallel text.
arXiv Detail & Related papers (2024-04-03T01:31:41Z) - Human Evaluation of English--Irish Transformer-Based NMT [2.648836772989769]
Best-performing Transformer system significantly reduces both accuracy and errors when compared with an RNN-based model.
When benchmarked against Google Translate, our translation engines demonstrated significant improvements.
arXiv Detail & Related papers (2024-03-04T11:45:46Z) - Deep Learning Transformer Architecture for Named Entity Recognition on
Low Resourced Languages: State of the art results [0.0]
This paper reports on the evaluation of Deep Learning (DL) transformer architecture models for Named-Entity Recognition (NER) on ten low-resourced South African (SA) languages.
The findings show that transformer models significantly improve performance when applying discrete fine-tuning parameters per language.
Further research could evaluate the more recent transformer architecture models on other Natural Language Processing tasks and applications.
arXiv Detail & Related papers (2021-11-01T11:02:01Z) - Improving Neural Machine Translation by Bidirectional Training [85.64797317290349]
We present a simple and effective pretraining strategy -- bidirectional training (BiT) for neural machine translation.
Specifically, we bidirectionally update the model parameters at the early stage and then tune the model normally.
Experimental results show that BiT pushes the SOTA neural machine translation performance across 15 translation tasks on 8 language pairs significantly higher.
arXiv Detail & Related papers (2021-09-16T07:58:33Z) - Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex.
This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z) - Optimizing Transformer for Low-Resource Neural Machine Translation [4.802292434636455]
Language pairs with limited amounts of parallel data, also known as low-resource languages, remain a challenge for neural machine translation.
Our experiments on different subsets of the IWSLT14 training data show that the effectiveness of Transformer under low-resource conditions is highly dependent on the hyper- parameter settings.
Using an optimized Transformer for low-resource conditions improves the translation quality up to 7.3 BLEU points compared to using the Transformer default settings.
arXiv Detail & Related papers (2020-11-04T13:12:29Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z) - Learning Source Phrase Representations for Neural Machine Translation [65.94387047871648]
We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations.
In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
arXiv Detail & Related papers (2020-06-25T13:43:11Z) - Recipes for Adapting Pre-trained Monolingual and Multilingual Models to
Machine Translation [50.0258495437314]
We investigate the benefits and drawbacks of freezing parameters, and adding new ones, when fine-tuning a pre-trained model on Machine Translation (MT)
For BART we get the best performance by freezing most of the model parameters, and adding extra positional embeddings.
For mBART we match or outperform the performance of naive fine-tuning for most language pairs with the encoder, and most of the decoder, frozen.
arXiv Detail & Related papers (2020-04-30T16:09:22Z) - Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space [109.79957125584252]
Variational Autoencoder (VAE) can be both a powerful generative model and an effective representation learning framework for natural language.
In this paper, we propose the first large-scale language VAE model, Optimus.
arXiv Detail & Related papers (2020-04-05T06:20:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.