Breaking Language Barriers in Multilingual Mathematical Reasoning:
Insights and Observations
- URL: http://arxiv.org/abs/2310.20246v4
- Date: Tue, 28 Nov 2023 05:25:14 GMT
- Title: Breaking Language Barriers in Multilingual Mathematical Reasoning:
Insights and Observations
- Authors: Nuo Chen, Zinan Zheng, Ning Wu, Ming Gong, Yangqiu Song, Dongmei
Zhang, Jia Li
- Abstract summary: This paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs.
By utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct.
We propose different training strategies to build powerful xMR LLMs, named MathOctopus, notably outperform conventional open-source LLMs.
- Score: 90.73517523001149
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing research predominantly focuses on developing powerful language
learning models (LLMs) for mathematical reasoning within monolingual languages,
with few explorations in preserving efficacy in a multilingual context. To
bridge this gap, this paper pioneers exploring and training powerful
Multilingual Math Reasoning (xMR) LLMs. Firstly, by utilizing translation, we
construct the first multilingual math reasoning instruction dataset,
MGSM8KInstruct, encompassing ten distinct languages, thus addressing the issue
of training data scarcity in xMR tasks. Based on the collected dataset, we
propose different training strategies to build powerful xMR LLMs, named
MathOctopus, notably outperform conventional open-source LLMs and exhibit
superiority over ChatGPT in few-shot scenarios. Notably, MathOctopus-13B
reaches 47.6% accuracy which exceeds ChatGPT 46.3% on MGSM testset. Beyond
remarkable results, we unearth several pivotal observations and insights from
extensive experiments: (1) When extending the rejection sampling strategy to
the multilingual context, it proves effective for model performances, albeit
limited. (2) Employing parallel corpora for math Supervised Fine-Tuning (SFT)
across multiple languages not only significantly enhances model performance
multilingually but also elevates their monolingual performance. This indicates
that crafting multilingual corpora can be regarded as a vital strategy for
enhancing model performance in a specific language, especially in mathematical
reasoning tasks. For instance, MathOctopus-7B improves its counterparts that
trained on English from 42.2% to 50.8% on GSM8K testset.
Related papers
- MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks [12.665447518524187]
This study aims to perform a thorough evaluation of the non-English capabilities of SoTA LLMs by comparing them on the same set of multilingual datasets.
Our benchmark comprises 22 datasets covering 83 languages, including low-resource African languages.
We also perform a study on data contamination and find that several models are likely to be contaminated with multilingual evaluation benchmarks.
arXiv Detail & Related papers (2023-11-13T16:45:37Z) - The Skipped Beat: A Study of Sociopragmatic Understanding in LLMs for 64
Languages [17.055109973224265]
We present SPARROW, an extensive benchmark specifically designed for cross-lingual sociopragmatic meaning (SM) understanding.
SPARROW comprises 169 datasets covering 13 task types across six primary categories (e.g., anti-social language detection, emotion recognition)
We evaluate the performance of various multilingual pretrained language models (e.g., mT5) and instruction-tuned LLMs (e.g., BLOOMZ, ChatGPT) on SPARROW through fine-tuning, zero-shot, and/or few-shot learning.
arXiv Detail & Related papers (2023-10-23T04:22:44Z) - Baichuan 2: Open Large-scale Language Models [51.56361715162972]
We present Baichuan 2, a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens.
Baichuan 2 matches or outperforms other open-source models of similar size on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval.
arXiv Detail & Related papers (2023-09-19T04:13:22Z) - Extrapolating Large Language Models to Non-English by Aligning Languages [109.09051737966178]
Existing large language models show disparate capability across different languages.
In this paper, we empower pre-trained LLMs on non-English languages by building semantic alignment across languages.
arXiv Detail & Related papers (2023-08-09T13:32:06Z) - PolyLM: An Open Source Polyglot Large Language Model [57.64420154135178]
We present PolyLM, a multilingual large language model (LLMs) trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B.
To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training.
Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning.
arXiv Detail & Related papers (2023-07-12T09:00:37Z) - Massively Multilingual Shallow Fusion with Large Language Models [62.76735265311028]
We train a single multilingual language model (LM) for shallow fusion in multiple languages.
Compared to a dense LM of similar computation during inference, GLaM reduces the WER of an English long-tail test set by 4.4% relative.
In a multilingual shallow fusion task, GLaM improves 41 out of 50 languages with an average relative WER reduction of 3.85%, and a maximum reduction of 10%.
arXiv Detail & Related papers (2023-02-17T14:46:38Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.