xCoT: Cross-lingual Instruction Tuning for Cross-lingual
Chain-of-Thought Reasoning
- URL: http://arxiv.org/abs/2401.07037v1
- Date: Sat, 13 Jan 2024 10:53:53 GMT
- Title: xCoT: Cross-lingual Instruction Tuning for Cross-lingual
Chain-of-Thought Reasoning
- Authors: Linzheng Chai, Jian Yang, Tao Sun, Hongcheng Guo, Jiaheng Liu, Bing
Wang, Xiannian Liang, Jiaqi Bai, Tongliang Li, Qiyao Peng, Zhoujun Li
- Abstract summary: Chain-of-thought (CoT) has emerged as a powerful technique to elicit reasoning in large language models.
We propose a cross-lingual instruction fine-tuning framework (xCOT) to transfer knowledge from high-resource languages to low-resource languages.
- Score: 36.34986831526529
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chain-of-thought (CoT) has emerged as a powerful technique to elicit
reasoning in large language models and improve a variety of downstream tasks.
CoT mainly demonstrates excellent performance in English, but its usage in
low-resource languages is constrained due to poor language generalization. To
bridge the gap among different languages, we propose a cross-lingual
instruction fine-tuning framework (xCOT) to transfer knowledge from
high-resource languages to low-resource languages. Specifically, the
multilingual instruction training data (xCOT-INSTRUCT) is created to encourage
the semantic alignment of multiple languages. We introduce cross-lingual
in-context few-shot learning (xICL)) to accelerate multilingual agreement in
instruction tuning, where some fragments of source languages in examples are
randomly substituted by their counterpart translations of target languages.
During multilingual instruction tuning, we adopt the randomly online CoT
strategy to enhance the multilingual reasoning ability of the large language
model by first translating the query to another language and then answering in
English. To further facilitate the language transfer, we leverage the
high-resource CoT to supervise the training of low-resource languages with
cross-lingual distillation. Experimental results on previous benchmarks
demonstrate the superior performance of xCoT in reducing the gap among
different languages, highlighting its potential to reduce the cross-lingual
gap.
Related papers
- Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs)
It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs.
It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z) - mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models [21.616940026409818]
Large language models (LLMs) with Chain-of-thought (CoT) have recently emerged as a powerful technique for eliciting reasoning to improve downstream tasks.
We study multilingual reasoning consistency across multiple languages, using popular open-source LLMs.
We introduce multilingual CoT instruction tuning to boost reasoning capability across languages, thereby improving model consistency.
arXiv Detail & Related papers (2024-06-04T13:30:45Z) - LlamaTurk: Adapting Open-Source Generative Large Language Models for Low-Resource Language [2.9914612342004503]
This study explores an alternative solution by adapting large language models, primarily trained on English, to low-resource languages.
We assess various strategies, including continual training, instruction fine-tuning, task-specific fine-tuning, and vocabulary extension.
The results show that continual training improves language comprehension, as reflected in perplexity scores, and task-specific tuning generally enhances performance of downstream tasks.
arXiv Detail & Related papers (2024-05-13T13:41:59Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Language Chameleon: Transformation analysis between languages using
Cross-lingual Post-training based on Pre-trained language models [4.731313022026271]
In this study, we focus on a single low-resource language and perform extensive evaluation and probing experiments using cross-lingual post-training (XPT)
Results show that XPT not only outperforms or performs on par with monolingual models trained with orders of magnitudes more data but also is highly efficient in the transfer process.
arXiv Detail & Related papers (2022-09-14T05:20:52Z) - Meta-X$_{NLG}$: A Meta-Learning Approach Based on Language Clustering
for Zero-Shot Cross-Lingual Transfer and Generation [11.155430893354769]
This paper proposes a novel meta-learning framework to learn shareable structures from typologically diverse languages.
We first cluster the languages based on language representations and identify the centroid language of each cluster.
A meta-learning algorithm is trained with all centroid languages and evaluated on the other languages in the zero-shot setting.
arXiv Detail & Related papers (2022-03-19T05:22:07Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.