Exploring the Relationship between Alignment and Cross-lingual Transfer
in Multilingual Transformers
- URL: http://arxiv.org/abs/2306.02790v1
- Date: Mon, 5 Jun 2023 11:35:40 GMT
- Title: Exploring the Relationship between Alignment and Cross-lingual Transfer
in Multilingual Transformers
- Authors: F\'elix Gaschi, Patricio Cerda, Parisa Rastin and Yannick Toussaint
- Abstract summary: multilingual language models can achieve cross-lingual transfer without explicit cross-lingual training data.
One common way to improve this transfer is to perform realignment steps before fine-tuning.
But realignment methods were found to not always improve results across languages and tasks.
- Score: 0.6882042556551609
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Without any explicit cross-lingual training data, multilingual language
models can achieve cross-lingual transfer. One common way to improve this
transfer is to perform realignment steps before fine-tuning, i.e., to train the
model to build similar representations for pairs of words from translated
sentences. But such realignment methods were found to not always improve
results across languages and tasks, which raises the question of whether
aligned representations are truly beneficial for cross-lingual transfer. We
provide evidence that alignment is actually significantly correlated with
cross-lingual transfer across languages, models and random seeds. We show that
fine-tuning can have a significant impact on alignment, depending mainly on the
downstream task and the model. Finally, we show that realignment can, in some
instances, improve cross-lingual transfer, and we identify conditions in which
realignment methods provide significant improvements. Namely, we find that
realignment works better on tasks for which alignment is correlated with
cross-lingual transfer when generalizing to a distant language and with smaller
models, as well as when using a bilingual dictionary rather than FastAlign to
extract realignment pairs. For example, for POS-tagging, between English and
Arabic, realignment can bring a +15.8 accuracy improvement on distilmBERT, even
outperforming XLM-R Large by 1.7. We thus advocate for further research on
realignment methods for smaller multilingual models as an alternative to
scaling.
Related papers
- How Transliterations Improve Crosslingual Alignment [48.929677368744606]
Recent studies have shown that post-aligning multilingual pretrained language models (mPLMs) using alignment objectives can improve crosslingual alignment.
This paper attempts to explicitly evaluate the crosslingual alignment and identify the key elements in transliteration-based approaches that contribute to better performance.
arXiv Detail & Related papers (2024-09-25T20:05:45Z) - PreAlign: Boosting Cross-Lingual Transfer by Early Establishment of Multilingual Alignment [68.20851615263953]
Large language models demonstrate reasonable multilingual abilities, despite predominantly English-centric pretraining.
The spontaneous multilingual alignment in these models is shown to be weak, leading to unsatisfactory cross-lingual transfer and knowledge sharing.
We propose PreAlign, a framework that establishes multilingual alignment prior to language model pretraining.
arXiv Detail & Related papers (2024-07-23T06:59:53Z) - Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data.
We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z) - VECO 2.0: Cross-lingual Language Model Pre-training with
Multi-granularity Contrastive Learning [56.47303426167584]
We propose a cross-lingual pre-trained model VECO2.0 based on contrastive learning with multi-granularity alignments.
Specifically, the sequence-to-sequence alignment is induced to maximize the similarity of the parallel pairs and minimize the non-parallel pairs.
token-to-token alignment is integrated to bridge the gap between synonymous tokens excavated via the thesaurus dictionary from the other unpaired tokens in a bilingual instance.
arXiv Detail & Related papers (2023-04-17T12:23:41Z) - Improving Zero-Shot Multilingual Translation with Universal
Representations and Cross-Mappings [23.910477693942905]
Improved zero-shot translation requires the model to learn universal representations and cross-mapping relationships.
We propose the state's distance based on the optimal theory to model the difference of the representations output by the encoder.
We propose an agreement-based training scheme, which can help the model make consistent predictions.
arXiv Detail & Related papers (2022-10-28T02:47:05Z) - Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training [45.48003947488825]
We study two widely used robust training methods: adversarial training and randomized smoothing.
The experimental results demonstrate that robust training can improve zero-shot cross-lingual transfer for text classification.
arXiv Detail & Related papers (2021-04-17T21:21:53Z) - Word Alignment by Fine-tuning Embeddings on Parallel Corpora [96.28608163701055]
Word alignment over parallel corpora has a wide variety of applications, including learning translation lexicons, cross-lingual transfer of language processing tools, and automatic evaluation or analysis of translation outputs.
Recently, other work has demonstrated that pre-trained contextualized word embeddings derived from multilingually trained language models (LMs) prove an attractive alternative, achieving competitive results on the word alignment task even in the absence of explicit training on parallel data.
In this paper, we examine methods to marry the two approaches: leveraging pre-trained LMs but fine-tuning them on parallel text with objectives designed to improve alignment quality, and proposing
arXiv Detail & Related papers (2021-01-20T17:54:47Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Cross-lingual Alignment Methods for Multilingual BERT: A Comparative
Study [2.101267270902429]
We analyse how different forms of cross-lingual supervision and various alignment methods influence the transfer capability of mBERT in zero-shot setting.
We find that supervision from parallel corpus is generally superior to dictionary alignments.
arXiv Detail & Related papers (2020-09-29T20:56:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.