LightMBERT: A Simple Yet Effective Method for Multilingual BERT
Distillation
- URL: http://arxiv.org/abs/2103.06418v1
- Date: Thu, 11 Mar 2021 02:24:41 GMT
- Title: LightMBERT: A Simple Yet Effective Method for Multilingual BERT
Distillation
- Authors: Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin
Li, Fang Wang and Qun Liu
- Abstract summary: multilingual pre-trained language models have shown impressive performance on cross-lingual natural language understanding tasks.
These models are computationally intensive and difficult to be deployed on resource-restricted devices.
We propose a simple yet effective distillation method (LightMBERT) for transferring the cross-lingual generalization ability of the multilingual BERT to a small student model.
- Score: 45.65004479806485
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The multilingual pre-trained language models (e.g, mBERT, XLM and XLM-R) have
shown impressive performance on cross-lingual natural language understanding
tasks. However, these models are computationally intensive and difficult to be
deployed on resource-restricted devices. In this paper, we propose a simple yet
effective distillation method (LightMBERT) for transferring the cross-lingual
generalization ability of the multilingual BERT to a small student model. The
experiment results empirically demonstrate the efficiency and effectiveness of
LightMBERT, which is significantly better than the baselines and performs
comparable to the teacher mBERT.
Related papers
- Breaking Language Barriers in Multilingual Mathematical Reasoning:
Insights and Observations [90.73517523001149]
This paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs.
By utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct.
We propose different training strategies to build powerful xMR LLMs, named MathOctopus, notably outperform conventional open-source LLMs.
arXiv Detail & Related papers (2023-10-31T08:09:20Z) - AMTSS: An Adaptive Multi-Teacher Single-Student Knowledge Distillation
Framework For Multilingual Language Inference [27.333905128454546]
AMTSS is an adaptive multi-teacher single-student distillation framework.
We first introduce an adaptive learning strategy and teacher importance weight, which enables a student to effectively learn from max-margin teachers.
We present a shared student with different projection layers in support of multiple languages, which contributes to largely reducing development and machine cost.
arXiv Detail & Related papers (2023-05-13T14:42:30Z) - Not All Languages Are Created Equal in LLMs: Improving Multilingual
Capability by Cross-Lingual-Thought Prompting [123.16452714740106]
Large language models (LLMs) demonstrate impressive multilingual capability, but their performance varies substantially across different languages.
We introduce a simple yet effective method, called cross-lingual-thought prompting (XLT)
XLT is a generic template prompt that stimulates cross-lingual and logical reasoning skills to enhance task performance across languages.
arXiv Detail & Related papers (2023-05-11T17:44:17Z) - LERT: A Linguistically-motivated Pre-trained Language Model [67.65651497173998]
We propose LERT, a pre-trained language model that is trained on three types of linguistic features along with the original pre-training task.
We carried out extensive experiments on ten Chinese NLU tasks, and the experimental results show that LERT could bring significant improvements.
arXiv Detail & Related papers (2022-11-10T05:09:16Z) - Multilingual Relation Classification via Efficient and Effective
Prompting [9.119073318043952]
We present the first work on prompt-based multilingual relation classification (RC)
We introduce an efficient and effective method that constructs prompts from relation triples and involves only minimal translation for the class labels.
We evaluate its performance in fully supervised, few-shot and zero-shot scenarios, and analyze its effectiveness across 14 languages.
arXiv Detail & Related papers (2022-10-25T08:40:23Z) - EMS: Efficient and Effective Massively Multilingual Sentence Embedding Learning [38.928786416891924]
We introduce efficient and effective massively multilingual sentence embedding (EMS) using cross-lingual token-level reconstruction (XTR) and sentence-level contrastive learning as training objectives.
Compared with related studies, the proposed model can be efficiently trained using significantly fewer parallel sentences and GPU computation resources.
We release the codes for model training and the EMS pre-trained sentence embedding model, which supports 62 languages.
arXiv Detail & Related papers (2022-05-31T12:29:25Z) - MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z) - LICHEE: Improving Language Model Pre-training with Multi-grained
Tokenization [19.89228774074371]
We propose a simple yet effective pre-training method named LICHEE to efficiently incorporate multi-grained information of input text.
Our method can be applied to various pre-trained language models and improve their representation capability.
arXiv Detail & Related papers (2021-08-02T12:08:19Z) - Mono vs Multilingual Transformer-based Models: a Comparison across
Several Language Tasks [1.2691047660244335]
BERT (Bidirectional Representations from Transformers) and ALBERT (A Lite BERT) are methods for pre-training language models.
We make available our trained BERT and Albert model for Portuguese.
arXiv Detail & Related papers (2020-07-19T19:13:20Z) - A Study of Cross-Lingual Ability and Language-specific Information in
Multilingual BERT [60.9051207862378]
multilingual BERT works remarkably well on cross-lingual transfer tasks.
Datasize and context window size are crucial factors to the transferability.
There is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
arXiv Detail & Related papers (2020-04-20T11:13:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.