MAPLE: Multilingual Evaluation of Parameter Efficient Finetuning of Large Language Models
- URL: http://arxiv.org/abs/2401.07598v3
- Date: Mon, 22 Jul 2024 11:13:54 GMT
- Title: MAPLE: Multilingual Evaluation of Parameter Efficient Finetuning of Large Language Models
- Authors: Divyanshu Aggarwal, Ashutosh Sathe, Ishaan Watts, Sunayana Sitaram,
- Abstract summary: Finetuning can improve the performance of language models without requiring massive resources and compute.
We finetune LLama-2-7B and Mistral-7B models on two synthetic multilingual instruction tuning datasets to determine its effect on model performance.
We find that PEFT of smaller open-source models sometimes bridges the gap between the performance of these models and the larger ones, however, English performance can take a hit.
- Score: 7.321459642283822
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Parameter Efficient Finetuning (PEFT) has emerged as a viable solution for improving the performance of Large Language Models (LLMs) without requiring massive resources and compute. Prior work on multilingual evaluation has shown that there is a large gap between the performance of LLMs on English and other languages. Further, there is also a large gap between the performance of smaller open-source models and larger LLMs. Finetuning can be an effective way to bridge this gap and make language models more equitable. In this work, we finetune the LLama-2-7B and Mistral-7B models on two synthetic multilingual instruction tuning datasets to determine its effect on model performance on six downstream tasks covering forty languages in all. Additionally, we experiment with various parameters, such as rank for low-rank adaptation and values of quantisation to determine their effects on downstream performance and find that higher rank and higher quantisation values benefit low-resource languages. We find that PEFT of smaller open-source models sometimes bridges the gap between the performance of these models and the larger ones, however, English performance can take a hit. We also find that finetuning sometimes improves performance on low-resource languages, while degrading performance on high-resource languages.
Related papers
- Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT)
We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training.
Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z) - ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models.
We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design.
Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv Detail & Related papers (2024-06-12T21:01:26Z) - Tele-FLM Technical Report [96.19923831660266]
We introduce Tele-FLM (aka FLM-2), a 52B open-sourced multilingual large language model.
It features a stable, efficient pre-training paradigm and enhanced factual judgment capabilities.
It is comparable to strong open-sourced models that involve larger pre-training FLOPs, such as Llama2-70B and DeepSeek-67B.
arXiv Detail & Related papers (2024-04-25T14:34:47Z) - Multilingual Sentence-T5: Scalable Sentence Encoders for Multilingual Applications [4.240899165468488]
We introduce Multilingual Sentence T5 (m-ST5) as a larger model of NLI-based multilingual sentence embedding.
By employing the low-rank adaptation (LoRA) technique, we have achieved a successful scaling of the model's size to 5.7 billion parameters.
It was particularly noteworthy that languages with fewer resources or those with less linguistic similarity to English benefited more from the parameter increase.
arXiv Detail & Related papers (2024-03-26T09:31:55Z) - On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based
Multilingual Model [49.81429697921861]
We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models.
We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
arXiv Detail & Related papers (2023-11-14T00:43:33Z) - Tokenizer Choice For LLM Training: Negligible or Crucial? [30.33170936148845]
We study the influence of tokenizer choice on Large Language Models (LLMs) downstream performance by training 24 mono- and multilingual LLMs.
We find that the tokenizer choice can significantly impact the model's downstream performance and training costs.
We show that multilingual tokenizers trained on the five most frequent European languages require vocabulary size increases of factor three in comparison to English.
arXiv Detail & Related papers (2023-10-12T22:44:19Z) - PaLM: Scaling Language Modeling with Pathways [180.69584031908113]
We trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.
We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods.
We demonstrate continued benefits of scaling by achieving state-of-the-art few-shot learning results on hundreds of language understanding and generation benchmarks.
arXiv Detail & Related papers (2022-04-05T16:11:45Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.