Analyzing Zero-shot Cross-lingual Transfer in Supervised NLP Tasks
- URL: http://arxiv.org/abs/2101.10649v1
- Date: Tue, 26 Jan 2021 09:21:25 GMT
- Title: Analyzing Zero-shot Cross-lingual Transfer in Supervised NLP Tasks
- Authors: Hyunjin Choi, Judong Kim, Seongho Joe, Seungjai Min, Youngjune Gwon
- Abstract summary: In zero-shot cross-lingual transfer, a supervised NLP task trained on a corpus in one language is directly applicable to another language without any additional training.
Recently introduced cross-lingual language model (XLM) pretraining brings out neural parameter sharing in Transformer-style networks.
In this paper, we aim to validate the hypothetically strong cross-lingual transfer properties induced by XLM pretraining.
- Score: 6.7155846430379285
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In zero-shot cross-lingual transfer, a supervised NLP task trained on a
corpus in one language is directly applicable to another language without any
additional training. A source of cross-lingual transfer can be as
straightforward as lexical overlap between languages (e.g., use of the same
scripts, shared subwords) that naturally forces text embeddings to occupy a
similar representation space. Recently introduced cross-lingual language model
(XLM) pretraining brings out neural parameter sharing in Transformer-style
networks as the most important factor for the transfer. In this paper, we aim
to validate the hypothetically strong cross-lingual transfer properties induced
by XLM pretraining. Particularly, we take XLM-RoBERTa (XLMR) in our experiments
that extend semantic textual similarity (STS), SQuAD and KorQuAD for machine
reading comprehension, sentiment analysis, and alignment of sentence embeddings
under various cross-lingual settings. Our results indicate that the presence of
cross-lingual transfer is most pronounced in STS, sentiment analysis the next,
and MRC the last. That is, the complexity of a downstream task softens the
degree of crosslingual transfer. All of our results are empirically observed
and measured, and we make our code and data publicly available.
Related papers
- Probing the Emergence of Cross-lingual Alignment during LLM Training [10.053333786023089]
Multilingual Large Language Models (LLMs) achieve remarkable levels of zero-shot cross-lingual transfer performance.
We study how such cross-lingual alignment emerges during pre-training of LLMs.
We observe a high correlation between neuron overlap and downstream performance.
arXiv Detail & Related papers (2024-06-19T05:31:59Z) - Self-Augmentation Improves Zero-Shot Cross-Lingual Transfer [92.80671770992572]
Cross-lingual transfer is a central task in multilingual NLP.
Earlier efforts on this task use parallel corpora, bilingual dictionaries, or other annotated alignment data.
We propose a simple yet effective method, SALT, to improve the zero-shot cross-lingual transfer.
arXiv Detail & Related papers (2023-09-19T19:30:56Z) - Languages You Know Influence Those You Learn: Impact of Language
Characteristics on Multi-Lingual Text-to-Text Transfer [4.554080966463776]
Multi-lingual language models (LM) have been remarkably successful in enabling natural language tasks in low-resource languages.
We try to better understand how such models, specifically mT5, transfer *any* linguistic and semantic knowledge across languages.
A key finding of this work is that similarity of syntax, morphology and phonology are good predictors of cross-lingual transfer.
arXiv Detail & Related papers (2022-12-04T07:22:21Z) - Cross-lingual Transferring of Pre-trained Contextualized Language Models [73.97131976850424]
We propose a novel cross-lingual model transferring framework for PrLMs: TreLM.
To handle the symbol order and sequence length differences between languages, we propose an intermediate TRILayer" structure.
We show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency.
arXiv Detail & Related papers (2021-07-27T06:51:13Z) - Syntax-augmented Multilingual BERT for Cross-lingual Transfer [37.99210035238424]
This work shows that explicitly providing language syntax and training mBERT helps cross-lingual transfer.
Experiment results show that syntax-augmented mBERT improves cross-lingual transfer on popular benchmarks.
arXiv Detail & Related papers (2021-06-03T21:12:50Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z) - From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual
Transfer with Multilingual Transformers [62.637055980148816]
Massively multilingual transformers pretrained with language modeling objectives have become a de facto default transfer paradigm for NLP.
We show that cross-lingual transfer via massively multilingual transformers is substantially less effective in resource-lean scenarios and for distant languages.
arXiv Detail & Related papers (2020-05-01T22:04:58Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.