Structure-Level Knowledge Distillation For Multilingual Sequence
Labeling
- URL: http://arxiv.org/abs/2004.03846v3
- Date: Mon, 4 May 2020 09:28:07 GMT
- Title: Structure-Level Knowledge Distillation For Multilingual Sequence
Labeling
- Authors: Xinyu Wang, Yong Jiang, Nguyen Bach, Tao Wang, Fei Huang, Kewei Tu
- Abstract summary: We propose to reduce the gap between monolingual models and the unified multilingual model by distilling the structural knowledge of several monolingual models to the unified multilingual model (student)
Our experiments on 4 multilingual tasks with 25 datasets show that our approaches outperform several strong baselines and have stronger zero-shot generalizability than both the baseline model and teacher models.
- Score: 73.40368222437912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual sequence labeling is a task of predicting label sequences using
a single unified model for multiple languages. Compared with relying on
multiple monolingual models, using a multilingual model has the benefit of a
smaller model size, easier in online serving, and generalizability to
low-resource languages. However, current multilingual models still underperform
individual monolingual models significantly due to model capacity limitations.
In this paper, we propose to reduce the gap between monolingual models and the
unified multilingual model by distilling the structural knowledge of several
monolingual models (teachers) to the unified multilingual model (student). We
propose two novel KD methods based on structure-level information: (1)
approximately minimizes the distance between the student's and the teachers'
structure level probability distributions, (2) aggregates the structure-level
knowledge to local distributions and minimizes the distance between two local
probability distributions. Our experiments on 4 multilingual tasks with 25
datasets show that our approaches outperform several strong baselines and have
stronger zero-shot generalizability than both the baseline model and teacher
models.
Related papers
- Cross-Lingual NER for Financial Transaction Data in Low-Resource
Languages [70.25418443146435]
We propose an efficient modeling framework for cross-lingual named entity recognition in semi-structured text data.
We employ two independent datasets of SMSs in English and Arabic, each carrying semi-structured banking transaction information.
With access to only 30 labeled samples, our model can generalize the recognition of merchants, amounts, and other fields from English to Arabic.
arXiv Detail & Related papers (2023-07-16T00:45:42Z) - PolyLM: An Open Source Polyglot Large Language Model [57.64420154135178]
We present PolyLM, a multilingual large language model (LLMs) trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B.
To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training.
Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning.
arXiv Detail & Related papers (2023-07-12T09:00:37Z) - Distilling Efficient Language-Specific Models for Cross-Lingual Transfer [75.32131584449786]
Massively multilingual Transformers (MMTs) are widely used for cross-lingual transfer learning.
MMTs' language coverage makes them unnecessarily expensive to deploy in terms of model size, inference time, energy, and hardware cost.
We propose to extract compressed, language-specific models from MMTs which retain the capacity of the original MMTs for cross-lingual transfer.
arXiv Detail & Related papers (2023-06-02T17:31:52Z) - Cross-Lingual Text Classification with Multilingual Distillation and
Zero-Shot-Aware Training [21.934439663979663]
Multi-branch multilingual language model (MBLM) built on Multilingual pre-trained language models (MPLMs)
Method based on transferring knowledge from high-performance monolingual models with a teacher-student framework.
Results on two cross-lingual classification tasks show that, with only the task's supervised data used, our method improves both the supervised and zero-shot performance of MPLMs.
arXiv Detail & Related papers (2022-02-28T09:51:32Z) - Are Multilingual Models the Best Choice for Moderately Under-resourced
Languages? A Comprehensive Assessment for Catalan [0.05277024349608833]
This work focuses on Catalan with the aim of exploring what extent a medium-sized monolingual language model is competitive with state-of-the-art large multilingual models.
We build a clean, high-quality textual Catalan corpus (CaText), train a Transformer-based language model for Catalan (BERTa), and devise a thorough evaluation in a diversity of settings.
The result is a new benchmark, the Catalan Language Understanding Benchmark (CLUB), which we publish as an open resource.
arXiv Detail & Related papers (2021-07-16T13:52:01Z) - How Good is Your Tokenizer? On the Monolingual Performance of
Multilingual Language Models [96.32118305166412]
We study a set of nine typologically diverse languages with readily available pretrained monolingual models on a set of five diverse monolingual downstream tasks.
We find that languages which are adequately represented in the multilingual model's vocabulary exhibit negligible performance decreases over their monolingual counterparts.
arXiv Detail & Related papers (2020-12-31T14:11:00Z) - Evaluating Cross-Lingual Transfer Learning Approaches in Multilingual
Conversational Agent Models [1.52292571922932]
We propose a general multilingual model framework for Natural Language Understanding (NLU) models.
We show that these multilingual models can reach same or better performance compared to monolingual models across language-specific test data.
arXiv Detail & Related papers (2020-12-07T17:14:52Z) - Towards Fully Bilingual Deep Language Modeling [1.3455090151301572]
We consider whether it is possible to pre-train a bilingual model for two remotely related languages without compromising performance at either language.
We create a Finnish-English bilingual BERT model and evaluate its performance on datasets used to evaluate the corresponding monolingual models.
Our bilingual model performs on par with Google's original English BERT on GLUE and nearly matches the performance of monolingual Finnish BERT on a range of Finnish NLP tasks.
arXiv Detail & Related papers (2020-10-22T12:22:50Z) - Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank [46.626315158735615]
Pretrained multilingual contextual representations have shown great success, but due to the limits of their pretraining data, their benefits do not apply equally to all language varieties.
This presents a challenge for language varieties unfamiliar to these models, whose labeled emphand unlabeled data is too limited to train a monolingual model effectively.
We propose the use of additional language-specific pretraining and vocabulary augmentation to adapt multilingual models to low-resource settings.
arXiv Detail & Related papers (2020-09-29T16:12:52Z) - Multilingual Translation with Extensible Multilingual Pretraining and
Finetuning [77.33262578776291]
Previous work has demonstrated that machine translation systems can be created by finetuning on bitext.
We show that multilingual translation models can be created through multilingual finetuning.
We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance.
arXiv Detail & Related papers (2020-08-02T05:36:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.