ZGUL: Zero-shot Generalization to Unseen Languages using Multi-source
Ensembling of Language Adapters
- URL: http://arxiv.org/abs/2310.16393v1
- Date: Wed, 25 Oct 2023 06:22:29 GMT
- Title: ZGUL: Zero-shot Generalization to Unseen Languages using Multi-source
Ensembling of Language Adapters
- Authors: Vipul Rathore, Rajdeep Dhingra, Parag Singla, Mausam
- Abstract summary: We tackle the problem of zero-shot cross-lingual transfer in NLP tasks via the use of language adapters (LAs)
Training target LA requires unlabeled data, which may not be readily available for low resource unseen languages.
We extend ZGUL to settings where either (1) some unlabeled data or (2) few-shot training examples are available for the target language.
- Score: 29.211715245603234
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We tackle the problem of zero-shot cross-lingual transfer in NLP tasks via
the use of language adapters (LAs). Most of the earlier works have explored
training with adapter of a single source (often English), and testing either
using the target LA or LA of another related language. Training target LA
requires unlabeled data, which may not be readily available for low resource
unseen languages: those that are neither seen by the underlying multilingual
language model (e.g., mBERT), nor do we have any (labeled or unlabeled) data
for them. We posit that for more effective cross-lingual transfer, instead of
just one source LA, we need to leverage LAs of multiple (linguistically or
geographically related) source languages, both at train and test-time - which
we investigate via our novel neural architecture, ZGUL. Extensive
experimentation across four language groups, covering 15 unseen target
languages, demonstrates improvements of up to 3.2 average F1 points over
standard fine-tuning and other strong baselines on POS tagging and NER tasks.
We also extend ZGUL to settings where either (1) some unlabeled data or (2)
few-shot training examples are available for the target language. We find that
ZGUL continues to outperform baselines in these settings too.
Related papers
- Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment [50.27950279695363]
The transfer performance is often hindered when a low-resource target language is written in a different script than the high-resource source language.
Inspired by recent work that uses transliteration to address this problem, our paper proposes a transliteration-based post-pretraining alignment (PPA) method.
arXiv Detail & Related papers (2024-06-28T08:59:24Z) - An Efficient Approach for Studying Cross-Lingual Transfer in Multilingual Language Models [26.72394783468532]
We propose an textitefficient method to study transfer language influence in zero-shot performance on another target language.
Our findings suggest that some languages do not largely affect others while some languages, especially ones unseen during pre-training, can be extremely beneficial or detrimental for different target languages.
arXiv Detail & Related papers (2024-03-29T09:52:18Z) - Self-Augmentation Improves Zero-Shot Cross-Lingual Transfer [92.80671770992572]
Cross-lingual transfer is a central task in multilingual NLP.
Earlier efforts on this task use parallel corpora, bilingual dictionaries, or other annotated alignment data.
We propose a simple yet effective method, SALT, to improve the zero-shot cross-lingual transfer.
arXiv Detail & Related papers (2023-09-19T19:30:56Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - Meta-X$_{NLG}$: A Meta-Learning Approach Based on Language Clustering
for Zero-Shot Cross-Lingual Transfer and Generation [11.155430893354769]
This paper proposes a novel meta-learning framework to learn shareable structures from typologically diverse languages.
We first cluster the languages based on language representations and identify the centroid language of each cluster.
A meta-learning algorithm is trained with all centroid languages and evaluated on the other languages in the zero-shot setting.
arXiv Detail & Related papers (2022-03-19T05:22:07Z) - Multilingual Transfer Learning for QA Using Translation as Data
Augmentation [13.434957024596898]
We explore strategies that improve cross-lingual transfer by bringing the multilingual embeddings closer in the semantic space.
We propose two novel strategies, language adversarial training and language arbitration framework, which significantly improve the (zero-resource) cross-lingual transfer performance.
Empirically, we show that the proposed models outperform the previous zero-shot baseline on the recently introduced multilingual MLQA and TyDiQA datasets.
arXiv Detail & Related papers (2020-12-10T20:29:34Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.