Related papers: Cross-lingual Transfer of Monolingual Models

Cross-lingual Transfer of Monolingual Models

URL: http://arxiv.org/abs/2109.07348v1
Date: Wed, 15 Sep 2021 15:00:53 GMT
Title: Cross-lingual Transfer of Monolingual Models
Authors: Evangelia Gogoulou, Ariel Ekgren, Tim Isbister, Magnus Sahlgren
Abstract summary: We introduce a cross-lingual transfer method for monolingual models based on domain adaptation. We study the effects of such transfer from four different languages to English.
Score: 2.332247755275824
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies in zero-shot cross-lingual learning using multilingual models have falsified the previous hypothesis that shared vocabulary and joint pre-training are the keys to cross-lingual generalization. Inspired by this advancement, we introduce a cross-lingual transfer method for monolingual models based on domain adaptation. We study the effects of such transfer from four different languages to English. Our experimental results on GLUE show that the transferred models outperform the native English model independently of the source language. After probing the English linguistic knowledge encoded in the representations before and after transfer, we find that semantic information is retained from the source language, while syntactic information is learned during transfer. Additionally, the results of evaluating the transferred models in source language tasks reveal that their performance in the source domain deteriorates after transfer.

Related papers

When Meanings Meet: Investigating the Emergence and Quality of Shared Concept Spaces during Multilingual Language Model Training [57.230355403478995]
We investigate the development of language-agnostic concept spaces during pretraining of EuroLLM.<n>We find that shared concept spaces emerge early and continue to refine, but that alignment with them is language-dependent.<n>In contrast to prior work, our fine-grained manual analysis reveals that some apparent gains in translation quality reflect shifts in behavior.
arXiv Detail & Related papers (2026-01-30T11:23:01Z)
LiveCLKTBench: Towards Reliable Evaluation of Cross-Lingual Knowledge Transfer in Multilingual LLMs [67.09110757873142]
We present LiveCLKTBench, an automated generation pipeline designed to isolate and measure cross-lingual knowledge transfer.<n>Our pipeline identifies self-contained, time-sensitive knowledge entities from real-world domains.<n>The documents of these valid entities are then used to generate factual questions, which are translated into multiple languages.
arXiv Detail & Related papers (2025-11-03T17:06:49Z)
Beyond the Rosetta Stone: Unification Forces in Generalization Dynamics [56.145578792496714]
Large language models (LLMs) struggle with cross-lingual knowledge transfer.<n>We study the causes and dynamics of this phenomenon by training small Transformer models from scratch on synthetic multilingual datasets.
arXiv Detail & Related papers (2025-08-14T18:44:13Z)
Cross-Linguistic Transfer in Multilingual NLP: The Role of Language Families and Morphology [0.0]
Cross-lingual transfer has become a crucial aspect of multilingual NLP.<n>This paper investigates cross-linguistic transfer through the lens of language families and morphology.
arXiv Detail & Related papers (2025-05-20T04:19:34Z)
PreAlign: Boosting Cross-Lingual Transfer by Early Establishment of Multilingual Alignment [68.20851615263953]
Large language models demonstrate reasonable multilingual abilities, despite predominantly English-centric pretraining. The spontaneous multilingual alignment in these models is shown to be weak, leading to unsatisfactory cross-lingual transfer and knowledge sharing. We propose PreAlign, a framework that establishes multilingual alignment prior to language model pretraining.
arXiv Detail & Related papers (2024-07-23T06:59:53Z)
Self-Translate-Train: Enhancing Cross-Lingual Transfer of Large Language Models via Inherent Capability [31.025371443719404]
Self-Translate-Train is a method that lets large language models translate training data into the target language and fine-tunes the model on its own generated data. By demonstrating that Self-Translate-Train outperforms zero-shot transfer, we encourage further exploration of better methods to elicit cross-lingual capabilities of LLMs.
arXiv Detail & Related papers (2024-06-29T14:40:23Z)
Unknown Script: Impact of Script on Cross-Lingual Transfer [2.5398014196797605]
Cross-lingual transfer has become an effective way of transferring knowledge between languages. We consider a case where the target language and its script are not part of the pre-trained model. Our findings reveal the importance of the tokenizer as a stronger factor than the shared script, language similarity, and model size.
arXiv Detail & Related papers (2024-04-29T15:48:01Z)
Measuring Cross-lingual Transfer in Bytes [9.011910726620538]
We show that models from diverse languages perform similarly to a target language in a cross-lingual setting. We also found evidence that this transfer is not related to language contamination or language proximity. Our experiments have opened up new possibilities for measuring how much data represents the language-agnostic representations learned during pretraining.
arXiv Detail & Related papers (2024-04-12T01:44:46Z)
Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings [10.871587311621974]
We experimentally demonstrate that high capacity multilingual language models applied in a zero-shot setting consistently outperform data-based cross-lingual transfer approaches. A detailed analysis of our results suggests that this might be due to important differences in language use. Our results also indicate that data-based cross-lingual transfer approaches remain a competitive option when high-capacity multilingual language models are not available.
arXiv Detail & Related papers (2022-10-23T05:37:35Z)
Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process. We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks. Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z)
Language Contamination Explains the Cross-lingual Capabilities of English Pretrained Models [79.38278330678965]
We find that common English pretraining corpora contain significant amounts of non-English text. This leads to hundreds of millions of foreign language tokens in large-scale datasets. We then demonstrate that even these small percentages of non-English data facilitate cross-lingual transfer for models trained on them.
arXiv Detail & Related papers (2022-04-17T23:56:54Z)
Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements. We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations. Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z)
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages. We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z)
Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models. In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them. We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.