Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5
for Machine Translation
- URL: http://arxiv.org/abs/2302.14220v3
- Date: Fri, 26 Jan 2024 16:02:53 GMT
- Title: Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5
for Machine Translation
- Authors: Lukas Edman, Gabriele Sarti, Antonio Toral, Gertjan van Noord, Arianna
Bisazza
- Abstract summary: We show the effectiveness of character-level modeling in translation, particularly in cases where fine-tuning data is limited.
While evaluating the importance of source texts in driving model predictions, we highlight word-level patterns within ByT5.
We conclude by assessing the efficiency tradeoff of byte models, suggesting their usage in non-time-critical scenarios to boost translation quality.
- Score: 9.736284584478032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretrained character-level and byte-level language models have been shown to
be competitive with popular subword models across a range of Natural Language
Processing (NLP) tasks. However, there has been little research on their
effectiveness for neural machine translation (NMT), particularly within the
popular pretrain-then-finetune paradigm. This work performs an extensive
comparison across multiple languages and experimental conditions of character-
and subword-level pretrained models (ByT5 and mT5, respectively) on NMT. We
show the effectiveness of character-level modeling in translation, particularly
in cases where fine-tuning data is limited. In our analysis, we show how
character models' gains in translation quality are reflected in better
translations of orthographically similar words and rare words. While evaluating
the importance of source texts in driving model predictions, we highlight
word-level patterns within ByT5, suggesting an ability to modulate word-level
and character-level information during generation. We conclude by assessing the
efficiency tradeoff of byte models, suggesting their usage in non-time-critical
scenarios to boost translation quality.
Related papers
- Ensembling Finetuned Language Models for Text Classification [55.15643209328513]
Finetuning is a common practice across different communities to adapt pretrained models to particular tasks.
ensembles of neural networks are typically used to boost performance and provide reliable uncertainty estimates.
We present a metadataset with predictions from five large finetuned models on six datasets and report results of different ensembling strategies.
arXiv Detail & Related papers (2024-10-25T09:15:54Z) - A Text-to-Text Model for Multilingual Offensive Language Identification [19.23565690468299]
This study presents the first pre-trained model with encoder-decoder architecture for offensive language identification with text-to-text transformers (T5)
Our pre-trained T5 model outperforms other transformer-based models fine-tuned for offensive language detection, such as fBERT and HateBERT, in multiple English benchmarks.
Following a similar approach, we also train the first multilingual pre-trained model for offensive language identification using mT5.
arXiv Detail & Related papers (2023-12-06T09:37:27Z) - Machine Translation for Ge'ez Language [0.0]
Machine translation for low-resource languages such as Ge'ez faces challenges such as out-of-vocabulary words, domain mismatches, and lack of labeled training data.
We develop a multilingual neural machine translation (MNMT) model based on languages relatedness.
We also experiment with using GPT-3.5, a state-of-the-art LLM, for few-shot translation with fuzzy matches.
arXiv Detail & Related papers (2023-11-24T14:55:23Z) - Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models.
We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks.
OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z) - Evaluating Byte and Wordpiece Level Models for Massively Multilingual
Semantic Parsing [3.431659287330068]
We compare a byte-level (ByT5) and a wordpiece based (mT5) sequence to sequence model on the 51 languages of the MASSIVE multilingual semantic parsing dataset.
We are able to reduce the gap in exact match accuracy to only 5 points with respect to a model trained on gold data from all the languages.
arXiv Detail & Related papers (2022-12-14T13:48:32Z) - Evaluation of Transfer Learning for Polish with a Text-to-Text Model [54.81823151748415]
We introduce a new benchmark for assessing the quality of text-to-text models for Polish.
The benchmark consists of diverse tasks and datasets: KLEJ benchmark adapted for text-to-text, en-pl translation, summarization, and question answering.
We present plT5 - a general-purpose text-to-text model for Polish that can be fine-tuned on various Natural Language Processing (NLP) tasks with a single training objective.
arXiv Detail & Related papers (2022-05-18T09:17:14Z) - Impact of Tokenization on Language Models: An Analysis for Turkish [2.4660652494309936]
We train tokenizers and pretrain medium-sized language models using RoBERTa pretraining procedure on the Turkish split of the OSCAR corpus.
Our experiments, supported by statistical tests, reveal that Morphological-level tokenizer has challenging performance with de facto tokenizers.
We find that increasing the vocabulary size improves the performance of Morphological and Word-level tokenizers more than that of de facto tokenizers.
arXiv Detail & Related papers (2022-04-19T12:01:46Z) - Few-shot learning through contextual data augmentation [74.20290390065475]
Machine translation models need to adapt to new data to maintain their performance over time.
We show that adaptation on the scale of one to five examples is possible.
Our model reports better accuracy scores than a reference system trained with on average 313 parallel examples.
arXiv Detail & Related papers (2021-03-31T09:05:43Z) - mT5: A massively multilingual pre-trained text-to-text transformer [60.0210636815514]
"Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on English-language NLP tasks.
We introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages.
arXiv Detail & Related papers (2020-10-22T17:58:14Z) - Improving Text Generation with Student-Forcing Optimal Transport [122.11881937642401]
We propose using optimal transport (OT) to match the sequences generated in training and testing modes.
An extension is also proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The effectiveness of the proposed method is validated on machine translation, text summarization, and text generation tasks.
arXiv Detail & Related papers (2020-10-12T19:42:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.