BERTGEN: Multi-task Generation through BERT
- URL: http://arxiv.org/abs/2106.03484v1
- Date: Mon, 7 Jun 2021 10:17:45 GMT
- Title: BERTGEN: Multi-task Generation through BERT
- Authors: Faidon Mitzalis, Ozan Caglayan, Pranava Madhyastha, Lucia Specia
- Abstract summary: We present BERTGEN, a novel generative, decoder-only model which extends BERT by fusing multimodal and multilingual pretrained models.
With a comprehensive set of evaluations, we show that BERTGEN outperforms many strong baselines across the tasks explored.
We also show BERTGEN's ability for zero-shot language generation, where it exhibits competitive performance to supervised counterparts.
- Score: 30.905286823599976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present BERTGEN, a novel generative, decoder-only model which extends BERT
by fusing multimodal and multilingual pretrained models VL-BERT and M-BERT,
respectively. BERTGEN is auto-regressively trained for language generation
tasks, namely image captioning, machine translation and multimodal machine
translation, under a multitask setting. With a comprehensive set of
evaluations, we show that BERTGEN outperforms many strong baselines across the
tasks explored. We also show BERTGEN's ability for zero-shot language
generation, where it exhibits competitive performance to supervised
counterparts. Finally, we conduct ablation studies which demonstrate that
BERTGEN substantially benefits from multi-tasking and effectively transfers
relevant inductive biases from the pre-trained models.
Related papers
- Can BERT Refrain from Forgetting on Sequential Tasks? A Probing Study [68.75670223005716]
We find that pre-trained language models like BERT have a potential ability to learn sequentially, even without any sparse memory replay.
Our experiments reveal that BERT can actually generate high quality representations for previously learned tasks in a long term, under extremely sparse replay or even no replay.
arXiv Detail & Related papers (2023-03-02T09:03:43Z) - XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems
to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders.
Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z) - Zero-shot Cross-lingual Transfer of Neural Machine Translation with
Multilingual Pretrained Encoders [74.89326277221072]
How to improve the cross-lingual transfer of NMT model with multilingual pretrained encoder is under-explored.
We propose SixT, a simple yet effective model for this task.
Our model achieves better performance on many-to-English testsets than CRISS and m2m-100.
arXiv Detail & Related papers (2021-04-18T07:42:45Z) - Bertinho: Galician BERT Representations [14.341471404165349]
This paper presents a monolingual BERT model for Galician.
We release two models, built using 6 and 12 transformer layers, respectively.
We show that our models, especially the 12-layer one, outperform the results of mBERT in most tasks.
arXiv Detail & Related papers (2021-03-25T12:51:34Z) - Cross-lingual Information Retrieval with BERT [8.052497255948046]
We explore the use of the popular bidirectional language model, BERT, to model and learn the relevance between English queries and foreign-language documents.
A deep relevance matching model based on BERT is introduced and trained by finetuning a pretrained multilingual BERT model with weak supervision.
Experimental results of the retrieval of Lithuanian documents against short English queries show that our model is effective and outperforms the competitive baseline approaches.
arXiv Detail & Related papers (2020-04-24T23:32:13Z) - A Study of Cross-Lingual Ability and Language-specific Information in
Multilingual BERT [60.9051207862378]
multilingual BERT works remarkably well on cross-lingual transfer tasks.
Datasize and context window size are crucial factors to the transferability.
There is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
arXiv Detail & Related papers (2020-04-20T11:13:16Z) - lamBERT: Language and Action Learning Using Multimodal BERT [0.1942428068361014]
This study proposes the language and action learning using multimodal BERT (lamBERT) model.
Experiment is conducted in a grid environment that requires language understanding for the agent to act properly.
The lamBERT model obtained higher rewards in multitask settings and transfer settings when compared to other models.
arXiv Detail & Related papers (2020-04-15T13:54:55Z) - What BERT Sees: Cross-Modal Transfer for Visual Question Generation [21.640299110619384]
We study the visual capabilities of BERT out-of-the-box, by avoiding pre-training made on supplementary data.
We introduce BERT-gen, a BERT-based architecture for text generation, able to leverage on either mono- or multi- modal representations.
arXiv Detail & Related papers (2020-02-25T12:44:36Z) - Incorporating BERT into Neural Machine Translation [251.54280200353674]
We propose a new algorithm named BERT-fused model, in which we first use BERT to extract representations for an input sequence.
We conduct experiments on supervised (including sentence-level and document-level translations), semi-supervised and unsupervised machine translation, and achieve state-of-the-art results on seven benchmark datasets.
arXiv Detail & Related papers (2020-02-17T08:13:36Z) - Multilingual Denoising Pre-training for Neural Machine Translation [132.66750663226287]
mBART is a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora.
mBART is one of the first methods for pre-training a complete sequence-to-sequence model.
arXiv Detail & Related papers (2020-01-22T18:59:17Z) - RobBERT: a Dutch RoBERTa-based Language Model [9.797319790710711]
We use RoBERTa to train a Dutch language model called RobBERT.
We measure its performance on various tasks as well as the importance of the fine-tuning dataset size.
RobBERT improves state-of-the-art results for various tasks, and especially significantly outperforms other models when dealing with smaller datasets.
arXiv Detail & Related papers (2020-01-17T13:25:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.