How to Tune a Multilingual Encoder Model for Germanic Languages: A Study of PEFT, Full Fine-Tuning, and Language Adapters
- URL: http://arxiv.org/abs/2501.06025v1
- Date: Fri, 10 Jan 2025 15:01:51 GMT
- Title: How to Tune a Multilingual Encoder Model for Germanic Languages: A Study of PEFT, Full Fine-Tuning, and Language Adapters
- Authors: Romina Oji, Jenny Kunz,
- Abstract summary: This paper investigates the optimal use of the multilingual encoder model mDeBERTa for tasks in three Germanic languages.<n>We compare full fine-tuning with the parameter-efficient fine-tuning (PEFT) methods LoRA and Pfeiffer bottleneck adapters.<n>While PEFT tends to work better for question answering, full fine-tuning is preferable for named entity recognition.
- Score: 0.7366405857677227
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates the optimal use of the multilingual encoder model mDeBERTa for tasks in three Germanic languages -- German, Swedish, and Icelandic -- representing varying levels of presence and likely data quality in mDeBERTas pre-training data. We compare full fine-tuning with the parameter-efficient fine-tuning (PEFT) methods LoRA and Pfeiffer bottleneck adapters, finding that PEFT is more effective for the higher-resource language, German. However, results for Swedish and Icelandic are less consistent. We also observe differences between tasks: While PEFT tends to work better for question answering, full fine-tuning is preferable for named entity recognition. Inspired by previous research on modular approaches that combine task and language adapters, we evaluate the impact of adding PEFT modules trained on unstructured text, finding that this approach is not beneficial.
Related papers
- Train More Parameters But Mind Their Placement: Insights into Language Adaptation with PEFT [0.8702432681310401]
We aim to enhance the generation performance of an LLM by specialising it using unstructured text corpora.<n>We find that increasing the number of trainable parameters leads to better and more robust language adaptation.<n>Although improvements are consistent in 0-shot summarisation, some adapted models struggle with longer context lengths.
arXiv Detail & Related papers (2024-12-17T08:44:00Z) - P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs [84.24644520272835]
Large language models (LLMs) showcase varied multilingual capabilities across tasks like translation, code generation, and reasoning.
Previous assessments often limited their scope to fundamental natural language processing (NLP) or isolated capability-specific tasks.
We present a pipeline for selecting available and reasonable benchmarks from massive ones, addressing the oversight in previous work regarding the utility of these benchmarks.
We introduce P-MMEval, a large-scale benchmark covering effective fundamental and capability-specialized datasets.
arXiv Detail & Related papers (2024-11-14T01:29:36Z) - A Parameter-efficient Language Extension Framework for Multilingual ASR [25.758826304861948]
We propose an architecture-based framework for language extension.
It is designed to be parameter-efficient, incrementally incorporating an add-on module to adapt to a new language.
Experiments are carried out on 5 new languages with a wide range of low-performing data sizes.
arXiv Detail & Related papers (2024-06-10T14:46:07Z) - Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation [3.558269947618352]
We evaluate the performance of 8 PEFT methods with in total of 15 architectures using the SacreBLEU score.
We showed that 6 PEFT architectures outperform the baseline for both in-domain and out-domain tests.
The Houlsby+Inversion adapter has the best performance overall, proving the effectiveness of PEFT methods.
arXiv Detail & Related papers (2024-04-05T16:42:28Z) - Efficient Adapter Finetuning for Tail Languages in Streaming
Multilingual ASR [44.949146169903074]
The heterogeneous nature and imbalanced data abundance of different languages may cause performance degradation.
Our proposed method brings 12.2% word error rate reduction on average and up to 37.5% on a single locale.
arXiv Detail & Related papers (2024-01-17T06:01:16Z) - Few-shot learning for automated content analysis: Efficient coding of
arguments and claims in the debate on arms deliveries to Ukraine [0.9576975587953563]
Pre-trained language models (PLM) based on transformer neural networks offer great opportunities to improve automatic content analysis in communication science.
Three characteristics so far impeded the widespread adoption of the methods in the applying disciplines: the dominance of English language models in NLP research, the necessary computing resources, and the effort required to produce training data to fine-tune PLMs.
We test our approach on a realistic use case from communication science to automatically detect claims and arguments together with their stance in the German news debate on arms deliveries to Ukraine.
arXiv Detail & Related papers (2023-12-28T11:39:08Z) - Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization [126.96113831681338]
In this paper, we propose to improve zero-shot cross-lingual transfer by composing language or task specialized parameters.
Our method composes language and task PEFT modules via element-wise arithmetic operations to leverage unlabeled data and English labeled data.
arXiv Detail & Related papers (2023-11-15T20:04:58Z) - Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis [84.12658971655253]
We propose Adapted Multimodal BERT, a BERT-based architecture for multimodal tasks.
adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations.
In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise.
arXiv Detail & Related papers (2022-12-01T17:31:42Z) - Probing Structured Pruning on Multilingual Pre-trained Models: Settings,
Algorithms, and Efficiency [62.0887259003594]
This work investigates three aspects of structured pruning on multilingual pre-trained language models: settings, algorithms, and efficiency.
Experiments on nine downstream tasks show several counter-intuitive phenomena.
We present Dynamic Sparsification, a simple approach that allows training the model once and adapting to different model sizes at inference.
arXiv Detail & Related papers (2022-04-06T06:29:52Z) - Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual
Retrieval [66.69799641522133]
State-of-the-art neural (re)rankers are notoriously data hungry.
Current approaches typically transfer rankers trained on English data to other languages and cross-lingual setups by means of multilingual encoders.
We show that two parameter-efficient approaches to cross-lingual transfer, namely Sparse Fine-Tuning Masks (SFTMs) and Adapters, allow for a more lightweight and more effective zero-shot transfer.
arXiv Detail & Related papers (2022-04-05T15:44:27Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.