Consistency Regularization for Cross-Lingual Fine-Tuning
- URL: http://arxiv.org/abs/2106.08226v1
- Date: Tue, 15 Jun 2021 15:35:44 GMT
- Title: Consistency Regularization for Cross-Lingual Fine-Tuning
- Authors: Bo Zheng, Li Dong, Shaohan Huang, Wenhui Wang, Zewen Chi, Saksham
Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei
- Abstract summary: We propose to improve cross-lingual fine-tuning with consistency regularization.
Specifically, we use example consistency regularization to penalize the prediction sensitivity to four types of data augmentations.
Experimental results on the XTREME benchmark show that our method significantly improves cross-lingual fine-tuning across various tasks.
- Score: 61.08704789561351
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-tuning pre-trained cross-lingual language models can transfer
task-specific supervision from one language to the others. In this work, we
propose to improve cross-lingual fine-tuning with consistency regularization.
Specifically, we use example consistency regularization to penalize the
prediction sensitivity to four types of data augmentations, i.e., subword
sampling, Gaussian noise, code-switch substitution, and machine translation. In
addition, we employ model consistency to regularize the models trained with two
augmented versions of the same training set. Experimental results on the XTREME
benchmark show that our method significantly improves cross-lingual fine-tuning
across various tasks, including text classification, question answering, and
sequence labeling.
Related papers
- VECO 2.0: Cross-lingual Language Model Pre-training with
Multi-granularity Contrastive Learning [56.47303426167584]
We propose a cross-lingual pre-trained model VECO2.0 based on contrastive learning with multi-granularity alignments.
Specifically, the sequence-to-sequence alignment is induced to maximize the similarity of the parallel pairs and minimize the non-parallel pairs.
token-to-token alignment is integrated to bridge the gap between synonymous tokens excavated via the thesaurus dictionary from the other unpaired tokens in a bilingual instance.
arXiv Detail & Related papers (2023-04-17T12:23:41Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - Multi-Level Contrastive Learning for Cross-Lingual Alignment [35.33431650608965]
Cross-language pre-trained models such as multilingual BERT (mBERT) have achieved significant performance in various cross-lingual downstream NLP tasks.
This paper proposes a multi-level contrastive learning framework to further improve the cross-lingual ability of pre-trained models.
arXiv Detail & Related papers (2022-02-26T07:14:20Z) - PARADISE: Exploiting Parallel Data for Multilingual Sequence-to-Sequence
Pretraining [19.785343302320918]
We present PARADISE (PARAllel & Denoising Integration in SEquence-to-sequence models)
It extends the conventional denoising objective used to train these models by (i) replacing words in the noised sequence according to a multilingual dictionary, and (ii) predicting the reference translation according to a parallel corpus.
Our experiments on machine translation and cross-lingual natural language inference show an average improvement of 2.0 BLEU points and accuracy 6.7 points from integrating parallel data into pretraining, respectively.
arXiv Detail & Related papers (2021-08-04T07:32:56Z) - Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond [58.80417796087894]
Cross-lingual adaptation with multilingual pre-trained language models (mPTLMs) mainly consists of two lines of works: zero-shot approach and translation-based approach.
We propose a novel framework to consolidate the zero-shot approach and the translation-based approach for better adaptation performance.
arXiv Detail & Related papers (2020-10-23T13:47:01Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.