Demystifying Multilingual Chain-of-Thought in Process Reward Modeling
- URL: http://arxiv.org/abs/2502.12663v1
- Date: Tue, 18 Feb 2025 09:11:44 GMT
- Title: Demystifying Multilingual Chain-of-Thought in Process Reward Modeling
- Authors: Weixuan Wang, Minghao Wu, Barry Haddow, Alexandra Birch,
- Abstract summary: We tackle the challenge of extending process reward models (PRMs) to multilingual settings.
We train multilingual PRMs on a dataset spanning seven languages, which is translated from English.
Our results highlight the sensitivity of multilingual PRMs to both the number of training languages and the volume of English data.
- Score: 71.12193680015622
- License:
- Abstract: Large language models (LLMs) are designed to perform a wide range of tasks. To improve their ability to solve complex problems requiring multi-step reasoning, recent research leverages process reward modeling to provide fine-grained feedback at each step of the reasoning process for reinforcement learning (RL), but it predominantly focuses on English. In this paper, we tackle the critical challenge of extending process reward models (PRMs) to multilingual settings. To achieve this, we train multilingual PRMs on a dataset spanning seven languages, which is translated from English. Through comprehensive evaluations on two widely used reasoning benchmarks across 11 languages, we demonstrate that multilingual PRMs not only improve average accuracy but also reduce early-stage reasoning errors. Furthermore, our results highlight the sensitivity of multilingual PRMs to both the number of training languages and the volume of English data, while also uncovering the benefits arising from more candidate responses and trainable parameters. This work opens promising avenues for robust multilingual applications in complex, multi-step reasoning tasks. In addition, we release the code to foster research along this line.
Related papers
- AdaCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Chain-of-Thought [19.692743208974296]
We introduce AdaCoT (Adaptive Chain-of-Thought), a framework that enhances multilingual reasoning.
AdaCoT dynamically routing thought processes through intermediary "thinking languages" before generating target-language responses.
arXiv Detail & Related papers (2025-01-27T15:48:57Z) - Bactrainus: Optimizing Large Language Models for Multi-hop Complex Question Answering Tasks [5.439505575097552]
We evaluate the ability of large language models in performing domain-specific tasks using the HotpotQA dataset.
This task serves as a challenging benchmark for assessing the language comprehension capabilities of these models.
The results of the study show that the integration of large language models with these techniques can lead to up to a 4% improvement in F1 score for finding answers.
arXiv Detail & Related papers (2025-01-10T18:44:06Z) - The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance.
Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes.
We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z) - No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement [59.37775534633868]
We introduce a novel method called language arithmetic, which enables training-free post-processing.
The effectiveness of the proposed solution is demonstrated on three downstream tasks in a MAD-X-based set of cross-lingual schemes.
arXiv Detail & Related papers (2024-04-24T08:52:40Z) - Eliciting Better Multilingual Structured Reasoning from LLMs through Code [17.870002864331322]
We introduce a multilingual structured reasoning and explanation dataset, termed xSTREET, that covers four tasks across six languages.
xSTREET exposes a gap in base LLM performance between English and non-English reasoning tasks.
We propose two methods to remedy this gap, building on the insight that LLMs trained on code are better reasoners.
arXiv Detail & Related papers (2024-03-05T00:48:56Z) - Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations [59.056367787688146]
This paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs.
We construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
By utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
arXiv Detail & Related papers (2023-10-31T08:09:20Z) - Enhancing Answer Boundary Detection for Multilingual Machine Reading
Comprehension [86.1617182312817]
We propose two auxiliary tasks in the fine-tuning stage to create additional phrase boundary supervision.
A mixed Machine Reading task, which translates the question or passage to other languages and builds cross-lingual question-passage pairs.
A language-agnostic knowledge masking task by leveraging knowledge phrases mined from web.
arXiv Detail & Related papers (2020-04-29T10:44:00Z) - XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating
Cross-lingual Generalization [128.37244072182506]
Cross-lingual TRansfer Evaluation of Multilinguals XTREME is a benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks.
We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models.
arXiv Detail & Related papers (2020-03-24T19:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.