XeroAlign: Zero-Shot Cross-lingual Transformer Alignment
- URL: http://arxiv.org/abs/2105.02472v1
- Date: Thu, 6 May 2021 07:10:00 GMT
- Title: XeroAlign: Zero-Shot Cross-lingual Transformer Alignment
- Authors: Milan Gritta, Ignacio Iacobacci
- Abstract summary: We introduce a method for task-specific alignment of cross-lingual pretrained transformers such as XLM-R.
XeroAlign uses translated task data to encourage the model to generate similar sentence embeddings for different languages.
XLM-RA's text classification accuracy exceeds that of XLM-R trained with labelled data and performs on par with state-of-the-art models on a cross-lingual adversarial paraphrasing task.
- Score: 9.340611077939828
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The introduction of pretrained cross-lingual language models brought decisive
improvements to multilingual NLP tasks. However, the lack of labelled task data
necessitates a variety of methods aiming to close the gap to high-resource
languages. Zero-shot methods in particular, often use translated task data as a
training signal to bridge the performance gap between the source and target
language(s). We introduce XeroAlign, a simple method for task-specific
alignment of cross-lingual pretrained transformers such as XLM-R. XeroAlign
uses translated task data to encourage the model to generate similar sentence
embeddings for different languages. The XeroAligned XLM-R, called XLM-RA, shows
strong improvements over the baseline models to achieve state-of-the-art
zero-shot results on three multilingual natural language understanding tasks.
XLM-RA's text classification accuracy exceeds that of XLM-R trained with
labelled data and performs on par with state-of-the-art models on a
cross-lingual adversarial paraphrasing task.
Related papers
- Zero-shot Cross-lingual Transfer without Parallel Corpus [6.937772043639308]
We propose a novel approach to conduct zero-shot cross-lingual transfer with a pre-trained model.
It consists of a Bilingual Task Fitting module that applies task-related bilingual information alignment.
A self-training module generates pseudo soft and hard labels for unlabeled data and utilizes them to conduct self-training.
arXiv Detail & Related papers (2023-10-07T07:54:22Z) - Self-Augmentation Improves Zero-Shot Cross-Lingual Transfer [92.80671770992572]
Cross-lingual transfer is a central task in multilingual NLP.
Earlier efforts on this task use parallel corpora, bilingual dictionaries, or other annotated alignment data.
We propose a simple yet effective method, SALT, to improve the zero-shot cross-lingual transfer.
arXiv Detail & Related papers (2023-09-19T19:30:56Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - Bilingual Alignment Pre-training for Zero-shot Cross-lingual Transfer [33.680292990007366]
In this paper, we aim to improve the zero-shot cross-lingual transfer performance by aligning the embeddings better.
We propose a pre-training task named Alignment Language Model (AlignLM) which uses the statistical alignment information as the prior knowledge to guide bilingual word prediction.
The results show AlignLM can improve the zero-shot performance significantly on MLQA and XNLI datasets.
arXiv Detail & Related papers (2021-06-03T10:18:43Z) - Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements.
We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations.
Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.