AdaMergeX: Cross-Lingual Transfer with Large Language Models via
Adaptive Adapter Merging
- URL: http://arxiv.org/abs/2402.18913v1
- Date: Thu, 29 Feb 2024 07:11:24 GMT
- Title: AdaMergeX: Cross-Lingual Transfer with Large Language Models via
Adaptive Adapter Merging
- Authors: Yiran Zhao, Wenxuan Zhang, Huiming Wang, Kenji Kawaguchi, Lidong Bing
- Abstract summary: Cross-lingual transfer is an effective alternative to the direct fine-tuning on target tasks in specific languages.
We propose a new cross-lingual transfer method called $textttAdaMergeX$ that utilizes adaptive adapter merging.
Our empirical results demonstrate that our approach yields new and effective cross-lingual transfer, outperforming existing methods across all settings.
- Score: 96.39773974044041
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: As an effective alternative to the direct fine-tuning on target tasks in
specific languages, cross-lingual transfer addresses the challenges of limited
training data by decoupling ''task ability'' and ''language ability'' by
fine-tuning on the target task in the source language and another selected task
in the target language, respectively. However, they fail to fully separate the
task ability from the source language or the language ability from the chosen
task. In this paper, we acknowledge the mutual reliance between task ability
and language ability and direct our attention toward the gap between the target
language and the source language on tasks. As the gap removes the impact of
tasks, we assume that it remains consistent across tasks. Based on this
assumption, we propose a new cross-lingual transfer method called
$\texttt{AdaMergeX}$ that utilizes adaptive adapter merging. By introducing a
reference task, we can determine that the divergence of adapters fine-tuned on
the reference task in both languages follows the same distribution as the
divergence of adapters fine-tuned on the target task in both languages. Hence,
we can obtain target adapters by combining the other three adapters.
Furthermore, we propose a structure-adaptive adapter merging method. Our
empirical results demonstrate that our approach yields new and effective
cross-lingual transfer, outperforming existing methods across all settings.
Related papers
- AAdaM at SemEval-2024 Task 1: Augmentation and Adaptation for Multilingual Semantic Textual Relatedness [16.896143197472114]
This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness for African and Asian languages.
We propose using machine translation for data augmentation to address the low-resource challenge of limited training data.
We achieve competitive results in the shared task: our system performs the best among all ranked teams in both subtask A (supervised learning) and subtask C (cross-lingual transfer)
arXiv Detail & Related papers (2024-04-01T21:21:15Z) - The Impact of Language Adapters in Cross-Lingual Transfer for NLU [0.8702432681310401]
We study the effect of including a target-language adapter in detailed ablation studies with two multilingual models and three multilingual datasets.
Our results show that the effect of target-language adapters is highly inconsistent across tasks, languages and models.
Removing the language adapter after training has only a weak negative effect, indicating that the language adapters do not have a strong impact on the predictions.
arXiv Detail & Related papers (2024-01-31T20:07:43Z) - Cross-Lingual Transfer with Target Language-Ready Task Adapters [66.5336029324059]
BAD-X, an extension of the MAD-X framework, achieves improved transfer at the cost of MAD-X's modularity.
We aim to take the best of both worlds by fine-tuning task adapters adapted to the target language.
arXiv Detail & Related papers (2023-06-05T10:46:33Z) - Multilingual Domain Adaptation for NMT: Decoupling Language and Domain
Information with Adapters [66.7986513246294]
We study the compositionality of language and domain adapters in the context of Machine Translation.
We find that in the partial resource scenario a naive combination of domain-specific and language-specific adapters often results in catastrophic forgetting' of the missing languages.
arXiv Detail & Related papers (2021-10-18T18:55:23Z) - Efficient Test Time Adapter Ensembling for Low-resource Language
Varieties [115.12997212870962]
Specialized language and task adapters have been proposed to facilitate cross-lingual transfer of multilingual pretrained models.
An intuitive solution is to use a related language adapter for the new language variety, but we observe that this solution can lead to sub-optimal performance.
In this paper, we aim to improve the robustness of language adapters to uncovered languages without training new adapters.
arXiv Detail & Related papers (2021-09-10T13:44:46Z) - MCL@IITK at SemEval-2021 Task 2: Multilingual and Cross-lingual
Word-in-Context Disambiguation using Augmented Data, Signals, and
Transformers [1.869621561196521]
We present our approach for solving the SemEval 2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC)
The goal is to detect whether a given word common to both the sentences evokes the same meaning.
We submit systems for both the settings - Multilingual and Cross-Lingual.
arXiv Detail & Related papers (2021-04-04T08:49:28Z) - Orthogonal Language and Task Adapters in Zero-Shot Cross-Lingual
Transfer [43.92142759245696]
orthoadapters are trained to encode language- and task-specific information that is complementary to the knowledge already stored in the pretrained transformer's parameters.
Our zero-shot cross-lingual transfer experiments, involving three tasks (POS-tagging, NER, NLI) and a set of 10 diverse languages, 1) point to the usefulness of orthoadapters in cross-lingual transfer, especially for the most complex NLI task, but also 2) indicate that the optimal adapter configuration highly depends on the task and the target language.
arXiv Detail & Related papers (2020-12-11T16:32:41Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer [136.09386219006123]
We propose MAD-X, an adapter-based framework that enables high portability and parameter-efficient transfer to arbitrary tasks and languages.
MAD-X outperforms the state of the art in cross-lingual transfer across a representative set of typologically diverse languages on named entity recognition and causal commonsense reasoning.
arXiv Detail & Related papers (2020-04-30T18:54:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.