Zero-shot cross-lingual transfer language selection using linguistic
similarity
- URL: http://arxiv.org/abs/2301.13720v1
- Date: Tue, 31 Jan 2023 15:56:40 GMT
- Title: Zero-shot cross-lingual transfer language selection using linguistic
similarity
- Authors: Juuso Eronen, Michal Ptaszynski, Fumito Masui
- Abstract summary: We study the selection of transfer languages for different Natural Language Processing tasks.
For the study, we used datasets from eight different languages from three language families.
- Score: 3.029434408969759
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We study the selection of transfer languages for different Natural Language
Processing tasks, specifically sentiment analysis, named entity recognition and
dependency parsing. In order to select an optimal transfer language, we propose
to utilize different linguistic similarity metrics to measure the distance
between languages and make the choice of transfer language based on this
information instead of relying on intuition. We demonstrate that linguistic
similarity correlates with cross-lingual transfer performance for all of the
proposed tasks. We also show that there is a statistically significant
difference in choosing the optimal language as the transfer source instead of
English. This allows us to select a more suitable transfer language which can
be used to better leverage knowledge from high-resource languages in order to
improve the performance of language applications lacking data. For the study,
we used datasets from eight different languages from three language families.
Related papers
- Unknown Script: Impact of Script on Cross-Lingual Transfer [2.5398014196797605]
Cross-lingual transfer has become an effective way of transferring knowledge between languages.
We consider a case where the target language and its script are not part of the pre-trained model.
Our findings reveal the importance of the tokenizer as a stronger factor than the shared script, language similarity, and model size.
arXiv Detail & Related papers (2024-04-29T15:48:01Z) - CORI: CJKV Benchmark with Romanization Integration -- A step towards Cross-lingual Transfer Beyond Textual Scripts [50.44270798959864]
Some languages are more well-connected than others, and target languages can benefit from transferring from closely related languages.
We study the impact of source language for cross-lingual transfer, demonstrating the importance of selecting source languages that have high contact with the target language.
arXiv Detail & Related papers (2024-04-19T04:02:50Z) - Cross-Lingual Transfer for Natural Language Inference via Multilingual Prompt Translator [104.63314132355221]
Cross-lingual transfer with prompt learning has shown promising effectiveness.
We propose a novel framework, Multilingual Prompt Translator (MPT)
MPT is more prominent compared with vanilla prompting when transferring to languages quite distinct from source language.
arXiv Detail & Related papers (2024-03-19T03:35:18Z) - GradSim: Gradient-Based Language Grouping for Effective Multilingual
Training [13.730907708289331]
We propose GradSim, a language grouping method based on gradient similarity.
Our experiments on three diverse multilingual benchmark datasets show that it leads to the largest performance gains.
Besides linguistic features, the topics of the datasets play an important role for language grouping.
arXiv Detail & Related papers (2023-10-23T18:13:37Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Transfer Language Selection for Zero-Shot Cross-Lingual Abusive Language
Detection [2.2998722397348335]
Instead of preparing a dataset for every language, we demonstrate the effectiveness of cross-lingual transfer learning for zero-shot abusive language detection.
Our datasets are from seven different languages from three language families.
arXiv Detail & Related papers (2022-06-02T09:53:15Z) - Multilingual transfer of acoustic word embeddings improves when training
on languages related to the target zero-resource language [32.170748231414365]
We show that training on even just a single related language gives the largest gain.
We also find that adding data from unrelated languages generally doesn't hurt performance.
arXiv Detail & Related papers (2021-06-24T08:37:05Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.