Language Models are Multilingual Chain-of-Thought Reasoners
- URL: http://arxiv.org/abs/2210.03057v1
- Date: Thu, 6 Oct 2022 17:03:34 GMT
- Title: Language Models are Multilingual Chain-of-Thought Reasoners
- Authors: Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats,
Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou,
Dipanjan Das, Jason Wei
- Abstract summary: We introduce the Multilingual Grade School Math benchmark, by manually translating 250 grade-school math problems into ten typologically diverse languages.
We find that the ability to solve MGSM problems via chain-of-thought prompting emerges with increasing model scale.
We show that the multilingual reasoning abilities of language models extend to other tasks such as commonsense reasoning and word-in-context semantic judgment.
- Score: 83.37148309771378
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We evaluate the reasoning abilities of large language models in multilingual
settings. We introduce the Multilingual Grade School Math (MGSM) benchmark, by
manually translating 250 grade-school math problems from the GSM8K dataset
(Cobbe et al., 2021) into ten typologically diverse languages. We find that the
ability to solve MGSM problems via chain-of-thought prompting emerges with
increasing model scale, and that models have strikingly strong multilingual
reasoning abilities, even in underrepresented languages such as Bengali and
Swahili. Finally, we show that the multilingual reasoning abilities of language
models extend to other tasks such as commonsense reasoning and word-in-context
semantic judgment. The MGSM benchmark is publicly available at
https://github.com/google-research/url-nlp.
Related papers
- mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models [21.616940026409818]
Large language models (LLMs) with Chain-of-thought (CoT) have recently emerged as a powerful technique for eliciting reasoning to improve downstream tasks.
We study multilingual reasoning consistency across multiple languages, using popular open-source LLMs.
We introduce multilingual CoT instruction tuning to boost reasoning capability across languages, thereby improving model consistency.
arXiv Detail & Related papers (2024-06-04T13:30:45Z) - Evaluating Large Language Models on Controlled Generation Tasks [92.64781370921486]
We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities.
After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models.
arXiv Detail & Related papers (2023-10-23T03:48:24Z) - Baichuan 2: Open Large-scale Language Models [51.56361715162972]
We present Baichuan 2, a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens.
Baichuan 2 matches or outperforms other open-source models of similar size on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval.
arXiv Detail & Related papers (2023-09-19T04:13:22Z) - Language Versatilists vs. Specialists: An Empirical Revisiting on
Multilingual Transfer Ability [11.000499414131324]
We conduct experiments across four types of reasoning tasks.
We find that the multilingual pretrained model does not always outperform an English-centric model.
English appears to be a less suitable source language, and the choice of source language becomes less important when the English-centric model scales up.
arXiv Detail & Related papers (2023-06-11T14:03:09Z) - Crosslingual Generalization through Multitask Finetuning [80.8822603322471]
Multitask prompted finetuning (MTF) has been shown to help large language models generalize to new tasks in a zero-shot setting.
We apply MTF to the pretrained multilingual BLOOM and mT5 model families to produce finetuned variants called BLOOMZ and mT0.
We find finetuning large multilingual language models on English tasks with English prompts allows for task generalization to non-English languages.
arXiv Detail & Related papers (2022-11-03T13:19:32Z) - Multilingual BERT has an accent: Evaluating English influences on
fluency in multilingual models [23.62852626011989]
We show that grammatical structures in higher-resource languages bleed into lower-resource languages.
We show this bias via a novel method for comparing the fluency of multilingual models to the fluency of monolingual Spanish and Greek models.
arXiv Detail & Related papers (2022-10-11T17:06:38Z) - English2Gbe: A multilingual machine translation model for {Fon/Ewe}Gbe [0.0]
This paper introduces English2Gbe, a multilingual neural machine translation model capable of translating from English to Ewe or Fon.
We show that English2Gbe outperforms bilingual models (English to Ewe and English Fon) and gives state-of-the-art results on the JW300 benchmark for Fon.
arXiv Detail & Related papers (2021-12-13T10:35:09Z) - xGQA: Cross-Lingual Visual Question Answering [100.35229218735938]
xGQA is a new multilingual evaluation benchmark for the visual question answering task.
We extend the established English GQA dataset to 7 typologically diverse languages.
We propose new adapter-based approaches to adapt multimodal transformer-based models to become multilingual.
arXiv Detail & Related papers (2021-09-13T15:58:21Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.