Incorporating BERT into Parallel Sequence Decoding with Adapters
- URL: http://arxiv.org/abs/2010.06138v1
- Date: Tue, 13 Oct 2020 03:25:15 GMT
- Title: Incorporating BERT into Parallel Sequence Decoding with Adapters
- Authors: Junliang Guo, Zhirui Zhang, Linli Xu, Hao-Ran Wei, Boxing Chen, Enhong
Chen
- Abstract summary: We propose to take two different BERT models as the encoder and decoder respectively, and fine-tune them by introducing simple and lightweight adapter modules.
We obtain a flexible and efficient model which is able to jointly leverage the information contained in the source-side and target-side BERT models.
Our framework is based on a parallel sequence decoding algorithm named Mask-Predict considering the bi-directional and conditional independent nature of BERT.
- Score: 82.65608966202396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While large scale pre-trained language models such as BERT have achieved
great success on various natural language understanding tasks, how to
efficiently and effectively incorporate them into sequence-to-sequence models
and the corresponding text generation tasks remains a non-trivial problem. In
this paper, we propose to address this problem by taking two different BERT
models as the encoder and decoder respectively, and fine-tuning them by
introducing simple and lightweight adapter modules, which are inserted between
BERT layers and tuned on the task-specific dataset. In this way, we obtain a
flexible and efficient model which is able to jointly leverage the information
contained in the source-side and target-side BERT models, while bypassing the
catastrophic forgetting problem. Each component in the framework can be
considered as a plug-in unit, making the framework flexible and task agnostic.
Our framework is based on a parallel sequence decoding algorithm named
Mask-Predict considering the bi-directional and conditional independent nature
of BERT, and can be adapted to traditional autoregressive decoding easily. We
conduct extensive experiments on neural machine translation tasks where the
proposed method consistently outperforms autoregressive baselines while
reducing the inference latency by half, and achieves $36.49$/$33.57$ BLEU
scores on IWSLT14 German-English/WMT14 German-English translation. When adapted
to autoregressive decoding, the proposed method achieves $30.60$/$43.56$ BLEU
scores on WMT14 English-German/English-French translation, on par with the
state-of-the-art baseline models.
Related papers
- Breaking the Token Barrier: Chunking and Convolution for Efficient Long
Text Classification with BERT [0.0]
Transformer-based models, specifically BERT, have propelled research in various NLP tasks.
BERT models are limited to a maximum token limit of 512 tokens. Consequently, this makes it non-trivial to apply it in a practical setting with long input.
We propose a relatively simple extension to vanilla BERT architecture called ChunkBERT that allows finetuning of any pretrained models to perform inference on arbitrarily long text.
arXiv Detail & Related papers (2023-10-31T15:41:08Z) - Mixed-Distil-BERT: Code-mixed Language Modeling for Bangla, English, and Hindi [0.0]
We introduce Tri-Distil-BERT, a multilingual model pre-trained on Bangla, English, and Hindi, and Mixed-Distil-BERT, a model fine-tuned on code-mixed data.
Our two-tiered pre-training approach offers efficient alternatives for multilingual and code-mixed language understanding.
arXiv Detail & Related papers (2023-09-19T02:59:41Z) - AMOM: Adaptive Masking over Masking for Conditional Masked Language
Model [81.55294354206923]
A conditional masked language model (CMLM) is one of the most versatile frameworks.
We introduce a simple yet effective adaptive masking over masking strategy to enhance the refinement capability of the decoder.
Our proposed model yields state-of-the-art performance on neural machine translation.
arXiv Detail & Related papers (2023-03-13T20:34:56Z) - Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis [84.12658971655253]
We propose Adapted Multimodal BERT, a BERT-based architecture for multimodal tasks.
adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations.
In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise.
arXiv Detail & Related papers (2022-12-01T17:31:42Z) - BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural
Machine Translation [38.017030073108735]
We show that a tailored and suitable bilingual pre-trained language model (dubbed BiBERT) achieves state-of-the-art translation performance.
Our best models achieve BLEU scores of 30.45 for En->De and 38.61 for De->En on the IWSLT'14 dataset, and 31.26 for En->De and 34.94 for De->En on the WMT'14 dataset.
arXiv Detail & Related papers (2021-09-09T23:43:41Z) - POINTER: Constrained Progressive Text Generation via Insertion-based
Generative Pre-training [93.79766670391618]
We present POINTER, a novel insertion-based approach for hard-constrained text generation.
The proposed method operates by progressively inserting new tokens between existing tokens in a parallel manner.
The resulting coarse-to-fine hierarchy makes the generation process intuitive and interpretable.
arXiv Detail & Related papers (2020-05-01T18:11:54Z) - DynaBERT: Dynamic BERT with Adaptive Width and Depth [55.18269622415814]
We propose a novel dynamic BERT model (abbreviated as DynaBERT)
It can flexibly adjust the size and latency by selecting adaptive width and depth.
It consistently outperforms existing BERT compression methods.
arXiv Detail & Related papers (2020-04-08T15:06:28Z) - Multilingual Denoising Pre-training for Neural Machine Translation [132.66750663226287]
mBART is a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora.
mBART is one of the first methods for pre-training a complete sequence-to-sequence model.
arXiv Detail & Related papers (2020-01-22T18:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.