English Intermediate-Task Training Improves Zero-Shot Cross-Lingual
Transfer Too
- URL: http://arxiv.org/abs/2005.13013v2
- Date: Wed, 30 Sep 2020 18:01:49 GMT
- Title: English Intermediate-Task Training Improves Zero-Shot Cross-Lingual
Transfer Too
- Authors: Jason Phang, Iacer Calixto, Phu Mon Htut, Yada Pruksachatkun, Haokun
Liu, Clara Vania, Katharina Kann, Samuel R. Bowman
- Abstract summary: Intermediatetask training improves model performance substantially on language understanding tasks in monolingual English settings.
We evaluate intermediate-task transfer in a zero-shot cross-lingual setting on the XTREME benchmark.
Using our best intermediate-task models for each target task, we obtain a 5.4 point improvement over XLM-R Large.
- Score: 42.95481834479966
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Intermediate-task training---fine-tuning a pretrained model on an
intermediate task before fine-tuning again on the target task---often improves
model performance substantially on language understanding tasks in monolingual
English settings. We investigate whether English intermediate-task training is
still helpful on non-English target tasks. Using nine intermediate
language-understanding tasks, we evaluate intermediate-task transfer in a
zero-shot cross-lingual setting on the XTREME benchmark. We see large
improvements from intermediate training on the BUCC and Tatoeba sentence
retrieval tasks and moderate improvements on question-answering target tasks.
MNLI, SQuAD and HellaSwag achieve the best overall results as intermediate
tasks, while multi-task intermediate offers small additional improvements.
Using our best intermediate-task models for each target task, we obtain a 5.4
point improvement over XLM-R Large on the XTREME benchmark, setting the state
of the art as of June 2020. We also investigate continuing multilingual MLM
during intermediate-task training and using machine-translated
intermediate-task data, but neither consistently outperforms simply performing
English intermediate-task training.
Related papers
- AAdaM at SemEval-2024 Task 1: Augmentation and Adaptation for Multilingual Semantic Textual Relatedness [16.896143197472114]
This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness for African and Asian languages.
We propose using machine translation for data augmentation to address the low-resource challenge of limited training data.
We achieve competitive results in the shared task: our system performs the best among all ranked teams in both subtask A (supervised learning) and subtask C (cross-lingual transfer)
arXiv Detail & Related papers (2024-04-01T21:21:15Z) - Rethinking and Improving Multi-task Learning for End-to-end Speech
Translation [51.713683037303035]
We investigate the consistency between different tasks, considering different times and modules.
We find that the textual encoder primarily facilitates cross-modal conversion, but the presence of noise in speech impedes the consistency between text and speech representations.
We propose an improved multi-task learning (IMTL) approach for the ST task, which bridges the modal gap by mitigating the difference in length and representation.
arXiv Detail & Related papers (2023-11-07T08:48:46Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - Does QA-based intermediate training help fine-tuning language models for
text classification? [12.023861154677203]
It is found that intermediate training based on high-level inference tasks such as Question Answering (QA) can improve the performance of some language models for target tasks.
In this paper, we experimented on eight tasks for single-sequence classification and eight tasks for sequence-pair classification using two base and two compact language models.
Our experiments show that QA-based intermediate training generates varying transfer performance across different language models, except for similar QA tasks.
arXiv Detail & Related papers (2021-12-30T13:30:25Z) - Zero-shot Cross-lingual Transfer of Neural Machine Translation with
Multilingual Pretrained Encoders [74.89326277221072]
How to improve the cross-lingual transfer of NMT model with multilingual pretrained encoder is under-explored.
We propose SixT, a simple yet effective model for this task.
Our model achieves better performance on many-to-English testsets than CRISS and m2m-100.
arXiv Detail & Related papers (2021-04-18T07:42:45Z) - MCL@IITK at SemEval-2021 Task 2: Multilingual and Cross-lingual
Word-in-Context Disambiguation using Augmented Data, Signals, and
Transformers [1.869621561196521]
We present our approach for solving the SemEval 2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC)
The goal is to detect whether a given word common to both the sentences evokes the same meaning.
We submit systems for both the settings - Multilingual and Cross-Lingual.
arXiv Detail & Related papers (2021-04-04T08:49:28Z) - Cross-lingual Retrieval for Iterative Self-Supervised Training [66.3329263451598]
Cross-lingual alignment can be further improved by training seq2seq models on sentence pairs mined using their own encoder outputs.
We develop a new approach -- cross-lingual retrieval for iterative self-supervised training.
arXiv Detail & Related papers (2020-06-16T21:30:51Z) - Intermediate-Task Transfer Learning with Pretrained Models for Natural
Language Understanding: When and Why Does It Work? [44.88358841370665]
It is poorly understood when and why intermediate-task training is beneficial for a given target task.
We perform a large-scale study on the pretrained RoBERTa model with 110 intermediate-target task combinations.
We observe that intermediate tasks requiring high-level inference and reasoning abilities tend to work best.
arXiv Detail & Related papers (2020-05-01T21:49:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.