Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation
- URL: http://arxiv.org/abs/2010.14271v1
- Date: Tue, 27 Oct 2020 13:12:17 GMT
- Title: Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation
- Authors: Junhao Liu, Linjun Shou, Jian Pei, Ming Gong, Min Yang, Daxin Jiang
- Abstract summary: Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
- Score: 105.41167108465085
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-lingual Machine Reading Comprehension (CLMRC) remains a challenging
problem due to the lack of large-scale annotated datasets in low-source
languages, such as Arabic, Hindi, and Vietnamese. Many previous approaches use
translation data by translating from a rich-source language, such as English,
to low-source languages as auxiliary supervision. However, how to effectively
leverage translation data and reduce the impact of noise introduced by
translation remains onerous. In this paper, we tackle this challenge and
enhance the cross-lingual transferring performance by a novel augmentation
approach named Language Branch Machine Reading Comprehension (LBMRC). A
language branch is a group of passages in one single language paired with
questions in all target languages. We train multiple machine reading
comprehension (MRC) models proficient in individual language based on LBMRC.
Then, we devise a multilingual distillation approach to amalgamate knowledge
from multiple language branch models to a single model for all target
languages. Combining the LBMRC and multilingual distillation can be more robust
to the data noises, therefore, improving the model's cross-lingual ability.
Meanwhile, the produced single multilingual model is applicable to all target
languages, which saves the cost of training, inference, and maintenance for
multiple models. Extensive experiments on two CLMRC benchmarks clearly show the
effectiveness of our proposed method.
Related papers
- Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model [14.39119862985503]
We aim to create a multilingual ALT system with available datasets.
Inspired by architectures that have been proven effective for English ALT, we adapt these techniques to the multilingual scenario.
We evaluate the performance of the multilingual model in comparison to its monolingual counterparts.
arXiv Detail & Related papers (2024-06-25T15:02:32Z) - Towards the Next 1000 Languages in Multilingual Machine Translation:
Exploring the Synergy Between Supervised and Self-Supervised Learning [48.15259834021655]
We present a pragmatic approach towards building a multilingual machine translation model that covers hundreds of languages.
We use a mixture of supervised and self-supervised objectives, depending on the data availability for different language pairs.
We demonstrate that the synergy between these two training paradigms enables the model to produce high-quality translations in the zero-resource setting.
arXiv Detail & Related papers (2022-01-09T23:36:44Z) - Breaking Down Multilingual Machine Translation [74.24795388967907]
We show that multilingual training is beneficial to encoders in general, while it only benefits decoders for low-resource languages (LRLs)
Our many-to-one models for high-resource languages and one-to-many models for LRLs outperform the best results reported by Aharoni et al.
arXiv Detail & Related papers (2021-10-15T14:57:12Z) - A Multilingual Modeling Method for Span-Extraction Reading Comprehension [2.4905424368103444]
We propose a multilingual extractive reading comprehension approach called XLRC.
We show that our model outperforms the state-of-the-art baseline (i.e., RoBERTa_Large) on the CMRC 2018 task.
arXiv Detail & Related papers (2021-05-31T11:05:30Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - Multilingual Transfer Learning for QA Using Translation as Data
Augmentation [13.434957024596898]
We explore strategies that improve cross-lingual transfer by bringing the multilingual embeddings closer in the semantic space.
We propose two novel strategies, language adversarial training and language arbitration framework, which significantly improve the (zero-resource) cross-lingual transfer performance.
Empirically, we show that the proposed models outperform the previous zero-shot baseline on the recently introduced multilingual MLQA and TyDiQA datasets.
arXiv Detail & Related papers (2020-12-10T20:29:34Z) - Beyond English-Centric Multilingual Machine Translation [74.21727842163068]
We create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages.
We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining.
Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT.
arXiv Detail & Related papers (2020-10-21T17:01:23Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.