Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning
- URL: http://arxiv.org/abs/2509.15811v1
- Date: Fri, 19 Sep 2025 09:38:54 GMT
- Title: Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning
- Authors: Sara Rajaee, Rochelle Choenni, Ekaterina Shutova, Christof Monz,
- Abstract summary: We train a reward model to rank generated responses for a given question across languages.<n>Our results show that our cross-lingual reward model substantially improves mathematical reasoning performance.<n>Our findings reveal new opportunities to improve multilingual reasoning by leveraging the complementary strengths of diverse languages.
- Score: 32.924257962911575
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While the reasoning abilities of large language models (LLMs) continue to advance, it remains unclear how such ability varies across languages in multilingual LLMs and whether different languages produce reasoning paths that complement each other. To investigate this question, we train a reward model to rank generated responses for a given question across languages. Our results show that our cross-lingual reward model substantially improves mathematical reasoning performance compared to using reward modeling within a single language, benefiting even high-resource languages. While English often exhibits the highest performance in multilingual models, we find that cross-lingual sampling particularly benefits English under low sampling budgets. Our findings reveal new opportunities to improve multilingual reasoning by leveraging the complementary strengths of diverse languages.
Related papers
- Align to the Pivot: Dual Alignment with Self-Feedback for Multilingual Math Reasoning [71.4175109189942]
We present Pivot-Aligned Self-Feedback Multilingual Reasoning (PASMR)<n>This approach designates the model's primary language as the pivot language.<n>It establishes a cross-lingual self-feedback mechanism without relying on external correct answers or reward models.
arXiv Detail & Related papers (2026-01-25T03:20:00Z) - Aligning Multilingual Reasoning with Verifiable Semantics from a High-Resource Expert Model [13.788758077632432]
We introduce Pivot-Based Reinforcement Learning with Semantically Verifiable Rewards.<n>This framework enhances multilingual reasoning by circumventing the need for human-annotated data in target languages.<n>We show that our method significantly narrows the performance gap between English and other languages.
arXiv Detail & Related papers (2025-09-29T22:03:11Z) - Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models [55.14276067678253]
This paper introduces a novel methodology for efficiently identifying inherent cross-lingual weaknesses in Large Language Models (LLMs)<n>We construct a new dataset of over 6,000 bilingual pairs across 16 languages using this methodology, demonstrating its effectiveness in revealing weaknesses even in state-of-the-art models.<n>Further experiments investigate the relationship between linguistic similarity and cross-lingual weaknesses, revealing that linguistically related languages share similar performance patterns.
arXiv Detail & Related papers (2025-05-24T12:31:27Z) - When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners [111.50503126693444]
We show that language-specific ablation consistently boosts multilingual reasoning performance.<n>Compared to post-training, our training-free ablation achieves comparable or superior results with minimal computational overhead.
arXiv Detail & Related papers (2025-05-21T08:35:05Z) - Crosslingual Reasoning through Test-Time Scaling [51.55526326294275]
We find that scaling up inference compute for English-centric reasoning language models (RLMs) improves multilingual mathematical reasoning across many languages.<n>While English-centric RLM's CoTs are naturally predominantly English, they consistently follow a quote-and-think pattern to reason about quoted non-English inputs.<n>We observe poor out-of-domain reasoning generalization, in particular from STEM to cultural commonsense knowledge, even for English.
arXiv Detail & Related papers (2025-05-08T16:50:06Z) - Demystifying Multilingual Chain-of-Thought in Process Reward Modeling [71.12193680015622]
We tackle the challenge of extending process reward models (PRMs) to multilingual settings.<n>We train multilingual PRMs on a dataset spanning seven languages, which is translated from English.<n>Our results highlight the sensitivity of multilingual PRMs to both the number of training languages and the volume of English data.
arXiv Detail & Related papers (2025-02-18T09:11:44Z) - The Multilingual Mind : A Survey of Multilingual Reasoning in Language Models [18.399229357408043]
Multilingual reasoning requires language models to handle logical reasoning across languages.<n>This survey provides the first in-depth review of multilingual reasoning in Language Models.
arXiv Detail & Related papers (2025-02-13T16:25:16Z) - AdaMCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Multilingual Chain-of-Thought [40.16140566668239]
We introduce AdaMCOT, a framework that enhances multilingual factual reasoning.<n>AdaMCOT dynamically routing thought processes in intermediary "thinking languages" before generating target-language responses.<n>Our evaluation demonstrates substantial improvements in both factual reasoning quality and cross-lingual consistency.
arXiv Detail & Related papers (2025-01-27T15:48:57Z) - LOLA -- An Open-Source Massively Multilingual Large Language Model [1.5704590739448838]
LOLA is a massively multilingual large language model trained on more than 160 languages.<n>Our architectural and implementation choices address the challenge of harnessing linguistic diversity.<n>We show how the learned expert-routing mechanism exploits implicit phylogenetic patterns to potentially alleviate the curse of multilinguality.
arXiv Detail & Related papers (2024-09-17T15:23:08Z) - Could We Have Had Better Multilingual LLMs If English Was Not the Central Language? [4.655168524016426]
Large Language Models (LLMs) demonstrate strong machine translation capabilities on languages they are trained on.
Our study delves into Llama2's translation capabilities.
Our experiments show that the 7B Llama2 model yields above 10 BLEU when translating into all languages it has seen.
arXiv Detail & Related papers (2024-02-21T16:32:38Z) - Improving Factuality and Reasoning in Language Models through Multiagent
Debate [95.10641301155232]
We present a complementary approach to improve language responses where multiple language model instances propose and debate their individual responses and reasoning processes over multiple rounds to arrive at a common final answer.
Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks.
Our approach may be directly applied to existing black-box models and uses identical procedure and prompts for all tasks we investigate.
arXiv Detail & Related papers (2023-05-23T17:55:11Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.