Related papers: Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments

Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments

URL: http://arxiv.org/abs/2502.17956v1
Date: Tue, 25 Feb 2025 08:27:28 GMT
Title: Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments
Authors: Patomporn Payoungkhamdee, Pume Tuchinda, Jinheon Baek, Samuel Cahyawijaya, Can Udomcharoenchaikit, Potsawee Manakul, Peerat Limkonchotiwat, Ekapol Chuangsuwanich, Sarana Nutanong,
Abstract summary: Multi-step reasoning is essential for large language models (LLMs), yet multilingual performance remains challenging.<n>While Chain-of-Thought (CoT) prompting improves reasoning, it struggles with non-English languages due to the entanglement of reasoning and execution.<n>We propose a framework to evaluate Program-of-Thought (PoT) prompting, offering a promising alternative but shifting the challenge to generating programs from non-English questions.
Score: 38.191619790402655
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-step reasoning is essential for large language models (LLMs), yet multilingual performance remains challenging. While Chain-of-Thought (CoT) prompting improves reasoning, it struggles with non-English languages due to the entanglement of reasoning and execution. Program-of-Thought (PoT) prompting separates reasoning from execution, offering a promising alternative but shifting the challenge to generating programs from non-English questions. We propose a framework to evaluate PoT by separating multilingual reasoning from code execution to examine (i) the impact of fine-tuning on question-reasoning alignment and (ii) how reasoning quality affects answer correctness. Our findings demonstrate that PoT fine-tuning substantially enhances multilingual reasoning, outperforming CoT fine-tuned models. We further demonstrate a strong correlation between reasoning quality (measured through code quality) and answer accuracy, highlighting its potential as a test-time performance improvement heuristic.

Related papers

Language Matters: How Do Multilingual Input and Reasoning Paths Affect Large Reasoning Models? [59.970391602080205]
Despite multilingual training, LRMs tend to default to reasoning in high-resource languages at test time.<n>Cultural reasoning degrades performance on reasoning tasks but benefits cultural tasks, while safety evaluations exhibit language-specific behavior.
arXiv Detail & Related papers (2025-05-23T02:46:18Z)
When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners [111.50503126693444]
We show that language-specific ablation consistently boosts multilingual reasoning performance.<n>Compared to post-training, our training-free ablation achieves comparable or superior results with minimal computational overhead.
arXiv Detail & Related papers (2025-05-21T08:35:05Z)
Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique [66.94905631175209]
We propose a novel inference-time scaling approach -- stepwise natural language self-critique (PANEL) It employs self-generated natural language critiques as feedback to guide the step-level search process. This approach bypasses the need for task-specific verifiers and the associated training overhead.
arXiv Detail & Related papers (2025-03-21T17:59:55Z)
Demystifying Multilingual Chain-of-Thought in Process Reward Modeling [71.12193680015622]
We tackle the challenge of extending process reward models (PRMs) to multilingual settings.<n>We train multilingual PRMs on a dataset spanning seven languages, which is translated from English.<n>Our results highlight the sensitivity of multilingual PRMs to both the number of training languages and the volume of English data.
arXiv Detail & Related papers (2025-02-18T09:11:44Z)
AdaCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Chain-of-Thought [19.692743208974296]
We introduce AdaCoT (Adaptive Chain-of-Thought), a framework that enhances multilingual reasoning.<n>AdaCoT dynamically routing thought processes through intermediary "thinking languages" before generating target-language responses.
arXiv Detail & Related papers (2025-01-27T15:48:57Z)
Reasoning Elicitation in Language Models via Counterfactual Feedback [17.908819732623716]
We derive novel metrics that balance accuracy in factual and counterfactual questions. We propose several fine-tuning approaches that aim to elicit better reasoning mechanisms. We evaluate the performance of the fine-tuned language models in a variety of realistic scenarios.
arXiv Detail & Related papers (2024-10-02T15:33:30Z)
The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance. Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes. We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z)
Enhancing Chain of Thought Prompting in Large Language Models via Reasoning Patterns [26.641713417293538]
Chain of Thought (CoT) prompting can encourage language models to engage in logical reasoning. We propose leveraging reasoning patterns to enhance CoT prompting effectiveness.
arXiv Detail & Related papers (2024-04-23T07:50:00Z)
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization [65.31411639849516]
We propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO) to align the reasoning processes in other languages with the dominant language. Specifically, we harness an off-the-shelf translation model for the consistency between answers in non-dominant and dominant languages. Experiments show that MAPO stably achieves significant improvements in the multilingual reasoning of various models.
arXiv Detail & Related papers (2024-01-12T18:03:54Z)
Empowering Multi-step Reasoning across Languages via Tree-of-Thoughts [1.8175282137722093]
Chain-of-Thought (CoT) methods empower Large Language Models (LLMs) to solve complex tasks in a step-by-step manner. The ability to deliver multi-step reasoning remains limited to English because of the imbalance in the distribution of pre-training data. We propose Cross-lingual Tree-of-Thoughts (Cross-ToT), a method for aligning Cross-lingual CoT reasoning across languages.
arXiv Detail & Related papers (2023-11-14T11:49:43Z)
Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning across Languages [46.496557448392494]
Chain-of-thought (CoT) is capable of eliciting models to explicitly generate reasoning paths. Existing zero-shot prompting techniques are limited to a single language. We introduce cross-lingual prompting (CLP), aiming to improve zero-shot CoT reasoning across languages.
arXiv Detail & Related papers (2023-10-23T10:56:03Z)
Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement [50.62461749446111]
Self-Polish (SP) is a novel method that facilitates the model's reasoning by guiding it to progressively refine the given problems to be more comprehensible and solvable. SP is to all other prompting methods of answer/reasoning side like CoT, allowing for seamless integration with state-of-the-art techniques for further improvement.
arXiv Detail & Related papers (2023-05-23T19:58:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.