Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling
- URL: http://arxiv.org/abs/2204.05210v1
- Date: Mon, 11 Apr 2022 15:55:20 GMT
- Title: Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling
- Authors: Nuo Chen, Linjun Shou, Ming Gong, Jian Pei, Daxin Jiang
- Abstract summary: Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
- Score: 101.74165219364264
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale cross-lingual pre-trained language models (xPLMs) have shown
effectiveness in cross-lingual sequence labeling tasks (xSL), such as
cross-lingual machine reading comprehension (xMRC) by transferring knowledge
from a high-resource language to low-resource languages. Despite the great
success, we draw an empirical observation that there is a training objective
gap between pre-training and fine-tuning stages: e.g., mask language modeling
objective requires local understanding of the masked token and the
span-extraction objective requires global understanding and reasoning of the
input passage/paragraph and question, leading to the discrepancy between
pre-training and xMRC. In this paper, we first design a pre-training task
tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM)
to eliminate the objective gap in a self-supervised manner. Second, we present
ContrAstive-Consistency Regularization (CACR), which utilizes contrastive
learning to encourage the consistency between representations of input parallel
sequences via unsupervised cross-lingual instance-wise training signals during
pre-training. By these means, our methods not only bridge the gap between
pretrain-finetune, but also enhance PLMs to better capture the alignment
between different languages. Extensive experiments prove that our method
achieves clearly superior results on multiple xSL benchmarks with limited
pre-training data. Our methods also surpass the previous state-of-the-art
methods by a large margin in few-shot data settings, where only a few hundred
training examples are available.
Related papers
- VECO 2.0: Cross-lingual Language Model Pre-training with
Multi-granularity Contrastive Learning [56.47303426167584]
We propose a cross-lingual pre-trained model VECO2.0 based on contrastive learning with multi-granularity alignments.
Specifically, the sequence-to-sequence alignment is induced to maximize the similarity of the parallel pairs and minimize the non-parallel pairs.
token-to-token alignment is integrated to bridge the gap between synonymous tokens excavated via the thesaurus dictionary from the other unpaired tokens in a bilingual instance.
arXiv Detail & Related papers (2023-04-17T12:23:41Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - XeroAlign: Zero-Shot Cross-lingual Transformer Alignment [9.340611077939828]
We introduce a method for task-specific alignment of cross-lingual pretrained transformers such as XLM-R.
XeroAlign uses translated task data to encourage the model to generate similar sentence embeddings for different languages.
XLM-RA's text classification accuracy exceeds that of XLM-R trained with labelled data and performs on par with state-of-the-art models on a cross-lingual adversarial paraphrasing task.
arXiv Detail & Related papers (2021-05-06T07:10:00Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z) - On Learning Universal Representations Across Languages [37.555675157198145]
We extend existing approaches to learn sentence-level representations and show the effectiveness on cross-lingual understanding and generation.
Specifically, we propose a Hierarchical Contrastive Learning (HiCTL) method to learn universal representations for parallel sentences distributed in one or multiple languages.
We conduct evaluations on two challenging cross-lingual tasks, XTREME and machine translation.
arXiv Detail & Related papers (2020-07-31T10:58:39Z) - InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language
Model Pre-Training [135.12061144759517]
We present an information-theoretic framework that formulates cross-lingual language model pre-training.
We propose a new pre-training task based on contrastive learning.
By leveraging both monolingual and parallel corpora, we jointly train the pretext to improve the cross-lingual transferability of pre-trained models.
arXiv Detail & Related papers (2020-07-15T16:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.