Oolong: Investigating What Makes Transfer Learning Hard with Controlled
Studies
- URL: http://arxiv.org/abs/2202.12312v2
- Date: Tue, 23 Jan 2024 22:09:07 GMT
- Title: Oolong: Investigating What Makes Transfer Learning Hard with Controlled
Studies
- Authors: Zhengxuan Wu and Alex Tamkin and Isabel Papadimitriou
- Abstract summary: We systematically transform the language of the GLUE benchmark, altering one axis of crosslingual variation at a time.
We find that models can largely recover from syntactic-style shifts, but cannot recover from vocabulary misalignment.
Our experiments provide insights into the factors of cross-lingual transfer that researchers should most focus on when designing language transfer scenarios.
- Score: 21.350999136803843
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When we transfer a pretrained language model to a new language, there are
many axes of variation that change at once. To disentangle the impact of
different factors like syntactic similarity and vocabulary similarity, we
propose a set of controlled transfer studies: we systematically transform the
language of the GLUE benchmark, altering one axis of crosslingual variation at
a time, and then measure the resulting drops in a pretrained model's downstream
performance. We find that models can largely recover from syntactic-style
shifts, but cannot recover from vocabulary misalignment and embedding matrix
re-initialization, even with continued pretraining on 15 million tokens. %On
the other hand, transferring to a dataset with an unaligned vocabulary is
extremely hard to recover from in the low-data regime. Moreover, good-quality
tokenizers in the transfer language do not make vocabulary alignment easier.
Our experiments provide insights into the factors of cross-lingual transfer
that researchers should most focus on when designing language transfer
scenarios.
Related papers
- When Meanings Meet: Investigating the Emergence and Quality of Shared Concept Spaces during Multilingual Language Model Training [57.230355403478995]
We investigate the development of language-agnostic concept spaces during pretraining of EuroLLM.<n>We find that shared concept spaces emerge early and continue to refine, but that alignment with them is language-dependent.<n>In contrast to prior work, our fine-grained manual analysis reveals that some apparent gains in translation quality reflect shifts in behavior.
arXiv Detail & Related papers (2026-01-30T11:23:01Z) - LiveCLKTBench: Towards Reliable Evaluation of Cross-Lingual Knowledge Transfer in Multilingual LLMs [67.09110757873142]
We present LiveCLKTBench, an automated generation pipeline designed to isolate and measure cross-lingual knowledge transfer.<n>Our pipeline identifies self-contained, time-sensitive knowledge entities from real-world domains.<n>The documents of these valid entities are then used to generate factual questions, which are translated into multiple languages.
arXiv Detail & Related papers (2025-11-03T17:06:49Z) - Beyond the Rosetta Stone: Unification Forces in Generalization Dynamics [56.145578792496714]
Large language models (LLMs) struggle with cross-lingual knowledge transfer.<n>We study the causes and dynamics of this phenomenon by training small Transformer models from scratch on synthetic multilingual datasets.
arXiv Detail & Related papers (2025-08-14T18:44:13Z) - Unknown Script: Impact of Script on Cross-Lingual Transfer [2.5398014196797605]
Cross-lingual transfer has become an effective way of transferring knowledge between languages.
We consider a case where the target language and its script are not part of the pre-trained model.
Our findings reveal the importance of the tokenizer as a stronger factor than the shared script, language similarity, and model size.
arXiv Detail & Related papers (2024-04-29T15:48:01Z) - Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data.
We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - Cross-lingual Transfer of Monolingual Models [2.332247755275824]
We introduce a cross-lingual transfer method for monolingual models based on domain adaptation.
We study the effects of such transfer from four different languages to English.
arXiv Detail & Related papers (2021-09-15T15:00:53Z) - Cross-lingual Transferring of Pre-trained Contextualized Language Models [73.97131976850424]
We propose a novel cross-lingual model transferring framework for PrLMs: TreLM.
To handle the symbol order and sequence length differences between languages, we propose an intermediate TRILayer" structure.
We show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency.
arXiv Detail & Related papers (2021-07-27T06:51:13Z) - Improving Zero-Shot Cross-Lingual Transfer Learning via Robust Training [45.48003947488825]
We study two widely used robust training methods: adversarial training and randomized smoothing.
The experimental results demonstrate that robust training can improve zero-shot cross-lingual transfer for text classification.
arXiv Detail & Related papers (2021-04-17T21:21:53Z) - Analyzing Zero-shot Cross-lingual Transfer in Supervised NLP Tasks [6.7155846430379285]
In zero-shot cross-lingual transfer, a supervised NLP task trained on a corpus in one language is directly applicable to another language without any additional training.
Recently introduced cross-lingual language model (XLM) pretraining brings out neural parameter sharing in Transformer-style networks.
In this paper, we aim to validate the hypothetically strong cross-lingual transfer properties induced by XLM pretraining.
arXiv Detail & Related papers (2021-01-26T09:21:25Z) - From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual
Transfer with Multilingual Transformers [62.637055980148816]
Massively multilingual transformers pretrained with language modeling objectives have become a de facto default transfer paradigm for NLP.
We show that cross-lingual transfer via massively multilingual transformers is substantially less effective in resource-lean scenarios and for distant languages.
arXiv Detail & Related papers (2020-05-01T22:04:58Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.