Related papers: AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging

AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging

URL: http://arxiv.org/abs/2402.18913v1
Date: Thu, 29 Feb 2024 07:11:24 GMT
Title: AdaMergeX: Cross-Lingual Transfer with Large Language Models via Adaptive Adapter Merging
Authors: Yiran Zhao, Wenxuan Zhang, Huiming Wang, Kenji Kawaguchi, Lidong Bing
Abstract summary: Cross-lingual transfer is an effective alternative to the direct fine-tuning on target tasks in specific languages. We propose a new cross-lingual transfer method called $textttAdaMergeX$ that utilizes adaptive adapter merging. Our empirical results demonstrate that our approach yields new and effective cross-lingual transfer, outperforming existing methods across all settings.
Score: 96.39773974044041
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: As an effective alternative to the direct fine-tuning on target tasks in specific languages, cross-lingual transfer addresses the challenges of limited training data by decoupling ''task ability'' and ''language ability'' by fine-tuning on the target task in the source language and another selected task in the target language, respectively. However, they fail to fully separate the task ability from the source language or the language ability from the chosen task. In this paper, we acknowledge the mutual reliance between task ability and language ability and direct our attention toward the gap between the target language and the source language on tasks. As the gap removes the impact of tasks, we assume that it remains consistent across tasks. Based on this assumption, we propose a new cross-lingual transfer method called $\texttt{AdaMergeX}$ that utilizes adaptive adapter merging. By introducing a reference task, we can determine that the divergence of adapters fine-tuned on the reference task in both languages follows the same distribution as the divergence of adapters fine-tuned on the target task in both languages. Hence, we can obtain target adapters by combining the other three adapters. Furthermore, we propose a structure-adaptive adapter merging method. Our empirical results demonstrate that our approach yields new and effective cross-lingual transfer, outperforming existing methods across all settings.

Related papers

UoB-NLP at SemEval-2025 Task 11: Leveraging Adapters for Multilingual and Cross-Lingual Emotion Detection [9.308405292847148]
We address multilingual and cross-lingual emotion detection by leveraging adapter-based fine-tuning with multilingual pre-trained language models. Our approach outperforms large language models in 11 languages and matches their performance in four others.
arXiv Detail & Related papers (2025-04-11T13:56:44Z)
AAdaM at SemEval-2024 Task 1: Augmentation and Adaptation for Multilingual Semantic Textual Relatedness [16.896143197472114]
This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness for African and Asian languages. We propose using machine translation for data augmentation to address the low-resource challenge of limited training data. We achieve competitive results in the shared task: our system performs the best among all ranked teams in both subtask A (supervised learning) and subtask C (cross-lingual transfer)
arXiv Detail & Related papers (2024-04-01T21:21:15Z)
The Impact of Language Adapters in Cross-Lingual Transfer for NLU [0.8702432681310401]
We study the effect of including a target-language adapter in detailed ablation studies with two multilingual models and three multilingual datasets. Our results show that the effect of target-language adapters is highly inconsistent across tasks, languages and models. Removing the language adapter after training has only a weak negative effect, indicating that the language adapters do not have a strong impact on the predictions.
arXiv Detail & Related papers (2024-01-31T20:07:43Z)
Cross-Lingual Transfer with Target Language-Ready Task Adapters [66.5336029324059]
BAD-X, an extension of the MAD-X framework, achieves improved transfer at the cost of MAD-X's modularity. We aim to take the best of both worlds by fine-tuning task adapters adapted to the target language.
arXiv Detail & Related papers (2023-06-05T10:46:33Z)
Multilingual Domain Adaptation for NMT: Decoupling Language and Domain Information with Adapters [66.7986513246294]
We study the compositionality of language and domain adapters in the context of Machine Translation. We find that in the partial resource scenario a naive combination of domain-specific and language-specific adapters often results in catastrophic forgetting' of the missing languages.
arXiv Detail & Related papers (2021-10-18T18:55:23Z)
Efficient Test Time Adapter Ensembling for Low-resource Language Varieties [115.12997212870962]
Specialized language and task adapters have been proposed to facilitate cross-lingual transfer of multilingual pretrained models. An intuitive solution is to use a related language adapter for the new language variety, but we observe that this solution can lead to sub-optimal performance. In this paper, we aim to improve the robustness of language adapters to uncovered languages without training new adapters.
arXiv Detail & Related papers (2021-09-10T13:44:46Z)
MCL@IITK at SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation using Augmented Data, Signals, and Transformers [1.869621561196521]
We present our approach for solving the SemEval 2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC) The goal is to detect whether a given word common to both the sentences evokes the same meaning. We submit systems for both the settings - Multilingual and Cross-Lingual.
arXiv Detail & Related papers (2021-04-04T08:49:28Z)
Orthogonal Language and Task Adapters in Zero-Shot Cross-Lingual Transfer [43.92142759245696]
orthoadapters are trained to encode language- and task-specific information that is complementary to the knowledge already stored in the pretrained transformer's parameters. Our zero-shot cross-lingual transfer experiments, involving three tasks (POS-tagging, NER, NLI) and a set of 10 diverse languages, 1) point to the usefulness of orthoadapters in cross-lingual transfer, especially for the most complex NLI task, but also 2) indicate that the optimal adapter configuration highly depends on the task and the target language.
arXiv Detail & Related papers (2020-12-11T16:32:41Z)
VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages. It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language. The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z)
MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer [136.09386219006123]
We propose MAD-X, an adapter-based framework that enables high portability and parameter-efficient transfer to arbitrary tasks and languages. MAD-X outperforms the state of the art in cross-lingual transfer across a representative set of typologically diverse languages on named entity recognition and causal commonsense reasoning.
arXiv Detail & Related papers (2020-04-30T18:54:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.