BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
- URL: http://arxiv.org/abs/2010.12321v2
- Date: Tue, 9 Feb 2021 09:31:57 GMT
- Title: BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
- Authors: Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis
- Abstract summary: We introduce BARThez, the first large-scale pretrained seq2seq model for French.
Being based on BART, BARThez is particularly well-suited for generative tasks.
We show BARThez to be very competitive with state-of-the-art BERT-based French language models.
- Score: 19.508391246171115
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inductive transfer learning has taken the entire NLP field by storm, with
models such as BERT and BART setting new state of the art on countless NLU
tasks. However, most of the available models and research have been conducted
for English. In this work, we introduce BARThez, the first large-scale
pretrained seq2seq model for French. Being based on BART, BARThez is
particularly well-suited for generative tasks. We evaluate BARThez on five
discriminative tasks from the FLUE benchmark and two generative tasks from a
novel summarization dataset, OrangeSum, that we created for this research. We
show BARThez to be very competitive with state-of-the-art BERT-based French
language models such as CamemBERT and FlauBERT. We also continue the
pretraining of a multilingual BART on BARThez' corpus, and show our resulting
model, mBARThez, to significantly boost BARThez' generative performance. Code,
data and models are publicly available.
Related papers
- VBART: The Turkish LLM [0.0]
VBART is the first Turkish sequence-to-sequence Large Language Models pre-trained on a large corpus from scratch.
Fine-tuned VBART models surpass the prior state-of-the-art results in abstractive text summarization, title generation, text paraphrasing, question answering and question generation tasks.
arXiv Detail & Related papers (2024-03-02T20:40:11Z) - Data-Efficient French Language Modeling with CamemBERTa [0.0]
We introduce CamemBERTa, a French DeBERTa model that builds upon the DeBERTaV3 architecture and training objective.
We evaluate our model's performance on a variety of French downstream tasks and datasets.
arXiv Detail & Related papers (2023-06-02T12:45:34Z) - GreekBART: The First Pretrained Greek Sequence-to-Sequence Model [13.429669368275318]
We introduce GreekBART, the first Seq2Seq model based on BART-base architecture and pretrained on a large-scale Greek corpus.
We evaluate and compare GreekBART against BART-random, Greek-BERT, and XLM-R on a variety of discriminative tasks.
arXiv Detail & Related papers (2023-04-03T10:48:51Z) - Masked Autoencoders As The Unified Learners For Pre-Trained Sentence
Representation [77.47617360812023]
We extend the recently proposed MAE style pre-training strategy, RetroMAE, to support a wide variety of sentence representation tasks.
The first stage performs RetroMAE over generic corpora, like Wikipedia, BookCorpus, etc., from which the base model is learned.
The second stage takes place on domain-specific data, e.g., MS MARCO and NLI, where the base model is continuingly trained based on RetroMAE and contrastive learning.
arXiv Detail & Related papers (2022-07-30T14:34:55Z) - Evaluation of Transfer Learning for Polish with a Text-to-Text Model [54.81823151748415]
We introduce a new benchmark for assessing the quality of text-to-text models for Polish.
The benchmark consists of diverse tasks and datasets: KLEJ benchmark adapted for text-to-text, en-pl translation, summarization, and question answering.
We present plT5 - a general-purpose text-to-text model for Polish that can be fine-tuned on various Natural Language Processing (NLP) tasks with a single training objective.
arXiv Detail & Related papers (2022-05-18T09:17:14Z) - AraBART: a Pretrained Arabic Sequence-to-Sequence Model for Abstractive
Summarization [23.540743628126837]
We propose AraBART, the first Arabic model in which the encoder and the decoder are pretrained end-to-end, based on BART.
We show that AraBART achieves the best performance on multiple abstractive summarization datasets.
arXiv Detail & Related papers (2022-03-21T13:11:41Z) - DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with
Gradient-Disentangled Embedding Sharing [117.41016786835452]
This paper presents a new pre-trained language model, DeBERTaV3, which improves the original DeBERTa model.
vanilla embedding sharing in ELECTRA hurts training efficiency and model performance.
We propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics.
arXiv Detail & Related papers (2021-11-18T06:48:00Z) - PAGnol: An Extra-Large French Generative Model [53.40189314359048]
We introduce PAGnol, a collection of French GPT models.
Using scaling laws, we efficiently train PAGnol-XL with the same computational budget as CamemBERT.
arXiv Detail & Related papers (2021-10-16T11:44:23Z) - GottBERT: a pure German Language Model [0.0]
No German single language RoBERTa model is yet published, which we introduce in this work (GottBERT)
In an evaluation we compare its performance on the two Named Entity Recognition (NER) tasks Conll 2003 and GermEval 2014 as well as on the text classification tasks GermEval 2018 (fine and coarse) and GNAD with existing German single language BERT models and two multilingual ones.
GottBERT was successfully pre-trained on a 256 core TPU pod using the RoBERTa BASE architecture.
arXiv Detail & Related papers (2020-12-03T17:45:03Z) - Revisiting Pre-Trained Models for Chinese Natural Language Processing [73.65780892128389]
We revisit Chinese pre-trained language models to examine their effectiveness in a non-English language.
We also propose a model called MacBERT, which improves upon RoBERTa in several ways.
arXiv Detail & Related papers (2020-04-29T02:08:30Z) - Multilingual Denoising Pre-training for Neural Machine Translation [132.66750663226287]
mBART is a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora.
mBART is one of the first methods for pre-training a complete sequence-to-sequence model.
arXiv Detail & Related papers (2020-01-22T18:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.