sPhinX: Sample Efficient Multilingual Instruction Fine-Tuning Through N-shot Guided Prompting
- URL: http://arxiv.org/abs/2407.09879v3
- Date: Wed, 16 Oct 2024 12:57:56 GMT
- Title: sPhinX: Sample Efficient Multilingual Instruction Fine-Tuning Through N-shot Guided Prompting
- Authors: Sanchit Ahuja, Kumar Tanmay, Hardik Hansrajbhai Chauhan, Barun Patra, Kriti Aggarwal, Luciano Del Corro, Arindam Mitra, Tejas Indulal Dhamecha, Ahmed Awadallah, Monojit Choudhary, Vishrav Chaudhary, Sunayana Sitaram,
- Abstract summary: We introduce a novel recipe for creating a multilingual synthetic instruction tuning dataset, sPhinX.
sPhinX is created by selectively translating instruction response pairs from English into 50 languages.
We test the effectiveness of sPhinx by using it to fine-tune two state-of-the-art models, Mistral-7B and Phi-Small.
- Score: 29.63634707674839
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the remarkable success of LLMs in English, there is a significant gap in performance in non-English languages. In order to address this, we introduce a novel recipe for creating a multilingual synthetic instruction tuning dataset, sPhinX, which is created by selectively translating instruction response pairs from English into 50 languages. We test the effectiveness of sPhinx by using it to fine-tune two state-of-the-art models, Mistral-7B and Phi-Small and then evaluating them across a comprehensive suite of multilingual benchmarks that test reasoning, question answering, reading comprehension and machine translation. Our results show that Mistral-7B and Phi-Small fine-tuned with sPhinX perform better on an average by 5%pt for both the models when compared to the base variants of these models. We also devise a strategy to incorporate N-shot examples in each fine-tuning sample which further boosts the performance of these models by 9%pt and 4%pt respectively respectively compared to vanilla fine-tuning. To show efficacy of our data curation approach, we also directly translate our original dataset to the target languages, and observe an increase of 7%pt and 4%pt on both the models respectively. sPhinX outperforms other multilingual instruction tuning datasets in both efficiency and diversity, reducing dataset creation costs. It also maintains strong performance on standard English LLM benchmarks, with minimal regression.
Related papers
- Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models [55.14276067678253]
This paper introduces a novel methodology for efficiently identifying inherent cross-lingual weaknesses in Large Language Models (LLMs)<n>We construct a new dataset of over 6,000 bilingual pairs across 16 languages using this methodology, demonstrating its effectiveness in revealing weaknesses even in state-of-the-art models.<n>Further experiments investigate the relationship between linguistic similarity and cross-lingual weaknesses, revealing that linguistically related languages share similar performance patterns.
arXiv Detail & Related papers (2025-05-24T12:31:27Z) - Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance [0.0]
We present our latest Hindi-English bi-lingual LLM textbfMantra-14B with 3% average improvement in benchmark scores over both languages.
We instruction tuned models such as Qwen-2.5-14B-Instruct and Phi-4 to improve performance over both English and Hindi.
Our results indicate that modest fine-tuning with culturally and locally informed data can bridge performance gaps without incurring significant computational overhead.
arXiv Detail & Related papers (2025-04-13T23:10:13Z) - Language Fusion for Parameter-Efficient Cross-lingual Transfer [21.96231169571248]
Fusion forLanguage Representations (FLARE) is a novel method that enhances representation quality and downstream performance for languages other than English.<n>FLARE integrates source and target language representations within low-rank (LoRA) adapters using lightweight linear transformations.<n>A series of experiments across representative cross-lingual natural language understanding tasks, including natural language inference, question-answering and sentiment analysis, demonstrate FLARE's effectiveness.
arXiv Detail & Related papers (2025-01-12T18:02:29Z) - Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model [66.17354128553244]
Most Large Vision-Language Models (LVLMs) to date are trained predominantly on English data.<n>We investigate how different training mixes tip the scale for different groups of languages.<n>We train Centurio, a 100-language LVLM, offering state-of-the-art performance in an evaluation covering 14 tasks and 56 languages.
arXiv Detail & Related papers (2025-01-09T10:26:14Z) - P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs [84.24644520272835]
Large language models (LLMs) showcase varied multilingual capabilities across tasks like translation, code generation, and reasoning.
Previous assessments often limited their scope to fundamental natural language processing (NLP) or isolated capability-specific tasks.
We present a pipeline for selecting available and reasonable benchmarks from massive ones, addressing the oversight in previous work regarding the utility of these benchmarks.
We introduce P-MMEval, a large-scale benchmark covering effective fundamental and capability-specialized datasets.
arXiv Detail & Related papers (2024-11-14T01:29:36Z) - The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance.
Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes.
We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z) - GeMQuAD : Generating Multilingual Question Answering Datasets from Large Language Models using Few Shot Learning [4.8838210812204235]
In this paper, we propose GeMQuAD - a semi-supervised learning approach, applied to a dataset generated through ICL with just one example in the target language.
We iteratively identify high-quality data to enhance model performance, especially for low-resource multilingual setting.
Our framework outperforms the machine translation-augmented model by 0.22/1.68 F1/EM points for Hindi and 0.82/1.37 F1/EM points for Spanish on the MLQA dataset.
arXiv Detail & Related papers (2024-04-14T06:55:42Z) - TaCo: Enhancing Cross-Lingual Transfer for Low-Resource Languages in LLMs through Translation-Assisted Chain-of-Thought Processes [9.254047358707014]
We introduce the Multilingual Instruction-Tuning dataset (MITS), comprised of Alpaca-52K, Dolly-15K, and Vicuna Benchmark translations into 132 languages.
Secondly, we propose a new method called emphTaCo: Translation-Assisted Cross-Linguality, which utilizes translations in a chain-of-thought process to instruction-tune LLMs on new languages through a curriculum-learning process.
Our results indicate that the TaCo method impresses GPT-4 with an 82% score for a low-resource language in the Vicuna Benchmark dataset, doubling the performance in contrast to instruction tuning
arXiv Detail & Related papers (2023-11-17T06:55:32Z) - Improving Domain-Specific Retrieval by NLI Fine-Tuning [64.79760042717822]
This article investigates the fine-tuning potential of natural language inference (NLI) data to improve information retrieval and ranking.
We employ both monolingual and multilingual sentence encoders fine-tuned by a supervised method utilizing contrastive loss and NLI data.
Our results point to the fact that NLI fine-tuning increases the performance of the models in both tasks and both languages, with the potential to improve mono- and multilingual models.
arXiv Detail & Related papers (2023-08-06T12:40:58Z) - LLM-powered Data Augmentation for Enhanced Cross-lingual Performance [24.20730298894794]
This paper explores the potential of leveraging Large Language Models (LLMs) for data augmentation in commonsense reasoning datasets.
To achieve this, we utilise several LLMs, namely Dolly-v2, StableVicuna, ChatGPT, and GPT-4, to augment three datasets: XCOPA, XWinograd, and XStoryCloze.
We evaluate the effectiveness of fine-tuning smaller multilingual models, mBERT and XLMR, using the synthesised data.
arXiv Detail & Related papers (2023-05-23T17:33:27Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - Multi-level Distillation of Semantic Knowledge for Pre-training
Multilingual Language Model [15.839724725094916]
Multi-level Multilingual Knowledge Distillation (MMKD) is a novel method for improving multilingual language models.
We employ a teacher-student framework to adopt rich semantic representation knowledge in English BERT.
We conduct experiments on cross-lingual evaluation benchmarks including XNLI, PAWS-X, and XQuAD.
arXiv Detail & Related papers (2022-11-02T15:23:13Z) - Multilingual Relation Classification via Efficient and Effective
Prompting [9.119073318043952]
We present the first work on prompt-based multilingual relation classification (RC)
We introduce an efficient and effective method that constructs prompts from relation triples and involves only minimal translation for the class labels.
We evaluate its performance in fully supervised, few-shot and zero-shot scenarios, and analyze its effectiveness across 14 languages.
arXiv Detail & Related papers (2022-10-25T08:40:23Z) - Prompt-Tuning Can Be Much Better Than Fine-Tuning on Cross-lingual
Understanding With Multilingual Language Models [95.32691891392903]
In this paper, we do cross-lingual evaluation on various NLU tasks using prompt-tuning and compare it with fine-tuning.
The results show that prompt tuning achieves much better cross-lingual transfer than fine-tuning across datasets.
arXiv Detail & Related papers (2022-10-22T05:48:02Z) - OneAligner: Zero-shot Cross-lingual Transfer with One Rich-Resource
Language Pair for Low-Resource Sentence Retrieval [91.76575626229824]
We present OneAligner, an alignment model specially designed for sentence retrieval tasks.
When trained with all language pairs of a large-scale parallel multilingual corpus (OPUS-100), this model achieves the state-of-the-art result.
We conclude through empirical results and analyses that the performance of the sentence alignment task depends mostly on the monolingual and parallel data size.
arXiv Detail & Related papers (2022-05-17T19:52:42Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z) - Multilingual BERT Post-Pretraining Alignment [26.62198329830013]
We propose a simple method to align multilingual contextual embeddings as a post-pretraining step.
Using parallel data, our method aligns embeddings on the word level through the recently proposed Translation Language Modeling objective.
We also perform sentence-level code-switching with English when fine on downstream tasks.
arXiv Detail & Related papers (2020-10-23T17:14:41Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.