Bactrian-X: Multilingual Replicable Instruction-Following Models with
Low-Rank Adaptation
- URL: http://arxiv.org/abs/2305.15011v2
- Date: Tue, 10 Oct 2023 07:46:44 GMT
- Title: Bactrian-X: Multilingual Replicable Instruction-Following Models with
Low-Rank Adaptation
- Authors: Haonan Li and Fajri Koto and Minghao Wu and Alham Fikri Aji and
Timothy Baldwin
- Abstract summary: Bactrian-X is a comprehensive multilingual parallel dataset of 3.4 million instruction-response pairs across 52 languages.
We train a set of adapters using low-rank adaptation (LoRA), which are lightweight components that seamlessly integrate with large language models.
Experiments in various multilingual evaluation settings demonstrate that models derived from LoRA-based training over Bactrian-X outperform both the vanilla models and existing instruction-tuned models.
- Score: 40.695782736177264
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Instruction tuning has shown great promise in improving the performance of
large language models. However, research on multilingual instruction tuning has
been limited due to the scarcity of high-quality instruction-response datasets
across different languages. To bridge this gap, we present Bactrian-X, a
comprehensive multilingual parallel dataset of 3.4 million instruction-response
pairs across 52 languages. Leveraging this dataset, we train a set of adapters
using low-rank adaptation (LoRA), which are lightweight components that
seamlessly integrate with large language models. These adapters have a
substantially lower parameter count than the base model, making them easily
replaceable and usable as plug-ins for different languages or language groups.
Extensive experiments in various multilingual evaluation settings demonstrate
that models derived from LoRA-based training over Bactrian-X outperform both
the vanilla models and existing instruction-tuned models. The code and models
are publicly available at https://github.com/mbzuai-nlp/bactrian-x
Related papers
- ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models.
We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design.
Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv Detail & Related papers (2024-06-12T21:01:26Z) - Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining [4.38070902806635]
We set up a benchmark for languages Croatian, Serbian, Bosnian and Montenegrin.
We show that comparable performance to dedicated from-scratch models can be obtained by additionally pretraining available multilingual models.
We also show that neighboring languages, in our case Slovenian, can be included in the additional pretraining with little to no loss in the performance of the final model.
arXiv Detail & Related papers (2024-04-08T11:55:44Z) - XSemPLR: Cross-Lingual Semantic Parsing in Multiple Natural Languages
and Meaning Representations [25.50509874992198]
Cross-Lingual Semantic Parsing aims to translate queries in multiple natural languages into meaning representations.
Existing CLSP models are separately proposed and evaluated on datasets of limited tasks and applications.
We present XSemPLR, a unified benchmark for cross-lingual semantic parsing featured with 22 natural languages and 8 meaning representations.
arXiv Detail & Related papers (2023-06-07T01:09:37Z) - Language-Family Adapters for Low-Resource Multilingual Neural Machine
Translation [129.99918589405675]
Large multilingual models trained with self-supervision achieve state-of-the-art results in a wide range of natural language processing tasks.
Multilingual fine-tuning improves performance on low-resource languages but requires modifying the entire model and can be prohibitively expensive.
We propose training language-family adapters on top of mBART-50 to facilitate cross-lingual transfer.
arXiv Detail & Related papers (2022-09-30T05:02:42Z) - mGPT: Few-Shot Learners Go Multilingual [1.4354798873010843]
This paper introduces two autoregressive GPT-like models with 1.3 billion and 13 billion parameters trained on 60 languages.
We reproduce the GPT-3 architecture using GPT-2 sources and the sparse attention mechanism.
The resulting models show performance on par with the recently released XGLM models by Facebook.
arXiv Detail & Related papers (2022-04-15T13:02:33Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - Multilingual Translation with Extensible Multilingual Pretraining and
Finetuning [77.33262578776291]
Previous work has demonstrated that machine translation systems can be created by finetuning on bitext.
We show that multilingual translation models can be created through multilingual finetuning.
We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance.
arXiv Detail & Related papers (2020-08-02T05:36:55Z) - Learning to Scale Multilingual Representations for Vision-Language Tasks [51.27839182889422]
The effectiveness of SMALR is demonstrated with ten diverse languages, over twice the number supported in vision-language tasks to date.
We evaluate on multilingual image-sentence retrieval and outperform prior work by 3-4% with less than 1/5th the training parameters compared to other word embedding methods.
arXiv Detail & Related papers (2020-04-09T01:03:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.