GradSim: Gradient-Based Language Grouping for Effective Multilingual
Training
- URL: http://arxiv.org/abs/2310.15269v1
- Date: Mon, 23 Oct 2023 18:13:37 GMT
- Title: GradSim: Gradient-Based Language Grouping for Effective Multilingual
Training
- Authors: Mingyang Wang, Heike Adel, Lukas Lange, Jannik Str\"otgen, Hinrich
Sch\"utze
- Abstract summary: We propose GradSim, a language grouping method based on gradient similarity.
Our experiments on three diverse multilingual benchmark datasets show that it leads to the largest performance gains.
Besides linguistic features, the topics of the datasets play an important role for language grouping.
- Score: 13.730907708289331
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most languages of the world pose low-resource challenges to natural language
processing models. With multilingual training, knowledge can be shared among
languages. However, not all languages positively influence each other and it is
an open research question how to select the most suitable set of languages for
multilingual training and avoid negative interference among languages whose
characteristics or data distributions are not compatible. In this paper, we
propose GradSim, a language grouping method based on gradient similarity. Our
experiments on three diverse multilingual benchmark datasets show that it leads
to the largest performance gains compared to other similarity measures and it
is better correlated with cross-lingual model performance. As a result, we set
the new state of the art on AfriSenti, a benchmark dataset for sentiment
analysis on low-resource African languages. In our extensive analysis, we
further reveal that besides linguistic features, the topics of the datasets
play an important role for language grouping and that lower layers of
transformer models encode language-specific features while higher layers
capture task-specific information.
Related papers
- Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model [14.39119862985503]
We aim to create a multilingual ALT system with available datasets.
Inspired by architectures that have been proven effective for English ALT, we adapt these techniques to the multilingual scenario.
We evaluate the performance of the multilingual model in comparison to its monolingual counterparts.
arXiv Detail & Related papers (2024-06-25T15:02:32Z) - Zero-shot Sentiment Analysis in Low-Resource Languages Using a
Multilingual Sentiment Lexicon [78.12363425794214]
We focus on zero-shot sentiment analysis tasks across 34 languages, including 6 high/medium-resource languages, 25 low-resource languages, and 3 code-switching datasets.
We demonstrate that pretraining using multilingual lexicons, without using any sentence-level sentiment data, achieves superior zero-shot performance compared to models fine-tuned on English sentiment datasets.
arXiv Detail & Related papers (2024-02-03T10:41:05Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Language Chameleon: Transformation analysis between languages using
Cross-lingual Post-training based on Pre-trained language models [4.731313022026271]
In this study, we focus on a single low-resource language and perform extensive evaluation and probing experiments using cross-lingual post-training (XPT)
Results show that XPT not only outperforms or performs on par with monolingual models trained with orders of magnitudes more data but also is highly efficient in the transfer process.
arXiv Detail & Related papers (2022-09-14T05:20:52Z) - Multilingual Text Classification for Dravidian Languages [4.264592074410622]
We propose a multilingual text classification framework for the Dravidian languages.
On the one hand, the framework used the LaBSE pre-trained model as the base model.
On the other hand, in view of the problem that the model cannot well recognize and utilize the correlation among languages, we further proposed a language-specific representation module.
arXiv Detail & Related papers (2021-12-03T04:26:49Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.