Evaluating Byte and Wordpiece Level Models for Massively Multilingual
Semantic Parsing
- URL: http://arxiv.org/abs/2212.07223v1
- Date: Wed, 14 Dec 2022 13:48:32 GMT
- Title: Evaluating Byte and Wordpiece Level Models for Massively Multilingual
Semantic Parsing
- Authors: Massimo Nicosia and Francesco Piccinno
- Abstract summary: We compare a byte-level (ByT5) and a wordpiece based (mT5) sequence to sequence model on the 51 languages of the MASSIVE multilingual semantic parsing dataset.
We are able to reduce the gap in exact match accuracy to only 5 points with respect to a model trained on gold data from all the languages.
- Score: 3.431659287330068
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Token free approaches have been successfully applied to a series of word and
span level tasks. In this work, we compare a byte-level (ByT5) and a wordpiece
based (mT5) sequence to sequence model on the 51 languages of the MASSIVE
multilingual semantic parsing dataset. We examine multiple experimental
settings: (i) zero-shot, (ii) full gold data and (iii) zero-shot with synthetic
data. By leveraging a state-of-the-art label projection method for machine
translated examples, we are able to reduce the gap in exact match accuracy to
only 5 points with respect to a model trained on gold data from all the
languages. We additionally provide insights on the cross-lingual transfer of
ByT5 and show how the model compares with respect to mT5 across all parameter
sizes.
Related papers
- MrT5: Dynamic Token Merging for Efficient Byte-level Language Models [50.46453950887946]
This work introduces MrT5 (MergeT5), a more efficient variant of ByT5.
MrT5 integrates a token deletion mechanism in its encoder to dynamically shorten the input sequence length.
When trained on English text, MrT5 demonstrates the capability to transfer its deletion feature zero-shot across several languages.
arXiv Detail & Related papers (2024-10-28T06:14:12Z) - A Text-to-Text Model for Multilingual Offensive Language Identification [19.23565690468299]
This study presents the first pre-trained model with encoder-decoder architecture for offensive language identification with text-to-text transformers (T5)
Our pre-trained T5 model outperforms other transformer-based models fine-tuned for offensive language detection, such as fBERT and HateBERT, in multiple English benchmarks.
Following a similar approach, we also train the first multilingual pre-trained model for offensive language identification using mT5.
arXiv Detail & Related papers (2023-12-06T09:37:27Z) - mmT5: Modular Multilingual Pre-Training Solves Source Language
Hallucinations [54.42422445568523]
mmT5 is a modular multilingual sequence-to-sequence model.
It disentangles language-specific information from language-agnostic information.
Compared to mT5, mmT5 raises the rate of generating text in the correct language under zero-shot settings from 7% to 99%.
arXiv Detail & Related papers (2023-05-23T16:38:01Z) - Are Character-level Translations Worth the Wait? Comparing ByT5 and mT5
for Machine Translation [9.736284584478032]
We show the effectiveness of character-level modeling in translation, particularly in cases where fine-tuning data is limited.
While evaluating the importance of source texts in driving model predictions, we highlight word-level patterns within ByT5.
We conclude by assessing the efficiency tradeoff of byte models, suggesting their usage in non-time-critical scenarios to boost translation quality.
arXiv Detail & Related papers (2023-02-28T00:50:19Z) - Crosslingual Generalization through Multitask Finetuning [80.8822603322471]
Multitask prompted finetuning (MTF) has been shown to help large language models generalize to new tasks in a zero-shot setting.
We apply MTF to the pretrained multilingual BLOOM and mT5 model families to produce finetuned variants called BLOOMZ and mT0.
We find finetuning large multilingual language models on English tasks with English prompts allows for task generalization to non-English languages.
arXiv Detail & Related papers (2022-11-03T13:19:32Z) - Evaluation of Transfer Learning for Polish with a Text-to-Text Model [54.81823151748415]
We introduce a new benchmark for assessing the quality of text-to-text models for Polish.
The benchmark consists of diverse tasks and datasets: KLEJ benchmark adapted for text-to-text, en-pl translation, summarization, and question answering.
We present plT5 - a general-purpose text-to-text model for Polish that can be fine-tuned on various Natural Language Processing (NLP) tasks with a single training objective.
arXiv Detail & Related papers (2022-05-18T09:17:14Z) - nmT5 -- Is parallel data still relevant for pre-training massively
multilingual language models? [9.560948239388662]
We investigate the impact of incorporating parallel data into mT5 pre-training.
We find that multi-tasking language modeling with objectives such as machine translation is a straightforward way to improve performance.
arXiv Detail & Related papers (2021-06-03T23:12:27Z) - mT6: Multilingual Pretrained Text-to-Text Transformer with Translation
Pairs [51.67970832510462]
We improve multilingual text-to-text transfer Transformer with translation pairs (mT6)
We explore three cross-lingual text-to-text pre-training tasks, namely, machine translation, translation pair span corruption, and translation span corruption.
Experimental results show that the proposed mT6 improves cross-lingual transferability over mT5.
arXiv Detail & Related papers (2021-04-18T03:24:07Z) - mT5: A massively multilingual pre-trained text-to-text transformer [60.0210636815514]
"Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on English-language NLP tasks.
We introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages.
arXiv Detail & Related papers (2020-10-22T17:58:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.