Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments
- URL: http://arxiv.org/abs/2502.17956v1
- Date: Tue, 25 Feb 2025 08:27:28 GMT
- Title: Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments
- Authors: Patomporn Payoungkhamdee, Pume Tuchinda, Jinheon Baek, Samuel Cahyawijaya, Can Udomcharoenchaikit, Potsawee Manakul, Peerat Limkonchotiwat, Ekapol Chuangsuwanich, Sarana Nutanong,
- Abstract summary: Multi-step reasoning is essential for large language models (LLMs), yet multilingual performance remains challenging.<n>While Chain-of-Thought (CoT) prompting improves reasoning, it struggles with non-English languages due to the entanglement of reasoning and execution.<n>We propose a framework to evaluate Program-of-Thought (PoT) prompting, offering a promising alternative but shifting the challenge to generating programs from non-English questions.
- Score: 38.191619790402655
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-step reasoning is essential for large language models (LLMs), yet multilingual performance remains challenging. While Chain-of-Thought (CoT) prompting improves reasoning, it struggles with non-English languages due to the entanglement of reasoning and execution. Program-of-Thought (PoT) prompting separates reasoning from execution, offering a promising alternative but shifting the challenge to generating programs from non-English questions. We propose a framework to evaluate PoT by separating multilingual reasoning from code execution to examine (i) the impact of fine-tuning on question-reasoning alignment and (ii) how reasoning quality affects answer correctness. Our findings demonstrate that PoT fine-tuning substantially enhances multilingual reasoning, outperforming CoT fine-tuned models. We further demonstrate a strong correlation between reasoning quality (measured through code quality) and answer accuracy, highlighting its potential as a test-time performance improvement heuristic.
Related papers
- Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique [66.94905631175209]
We propose a novel inference-time scaling approach -- stepwise natural language self-critique (PANEL)
It employs self-generated natural language critiques as feedback to guide the step-level search process.
This approach bypasses the need for task-specific verifiers and the associated training overhead.
arXiv Detail & Related papers (2025-03-21T17:59:55Z) - Demystifying Multilingual Chain-of-Thought in Process Reward Modeling [71.12193680015622]
We tackle the challenge of extending process reward models (PRMs) to multilingual settings.<n>We train multilingual PRMs on a dataset spanning seven languages, which is translated from English.<n>Our results highlight the sensitivity of multilingual PRMs to both the number of training languages and the volume of English data.
arXiv Detail & Related papers (2025-02-18T09:11:44Z) - AdaCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Chain-of-Thought [19.692743208974296]
We introduce AdaCoT (Adaptive Chain-of-Thought), a framework that enhances multilingual reasoning.<n>AdaCoT dynamically routing thought processes through intermediary "thinking languages" before generating target-language responses.
arXiv Detail & Related papers (2025-01-27T15:48:57Z) - Reasoning Elicitation in Language Models via Counterfactual Feedback [17.908819732623716]
We derive novel metrics that balance accuracy in factual and counterfactual questions.
We propose several fine-tuning approaches that aim to elicit better reasoning mechanisms.
We evaluate the performance of the fine-tuned language models in a variety of realistic scenarios.
arXiv Detail & Related papers (2024-10-02T15:33:30Z) - The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance.
Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes.
We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z) - Enhancing Chain of Thought Prompting in Large Language Models via Reasoning Patterns [26.641713417293538]
Chain of Thought (CoT) prompting can encourage language models to engage in logical reasoning.
We propose leveraging reasoning patterns to enhance CoT prompting effectiveness.
arXiv Detail & Related papers (2024-04-23T07:50:00Z) - MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization [65.31411639849516]
We propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO) to align the reasoning processes in other languages with the dominant language.
Specifically, we harness an off-the-shelf translation model for the consistency between answers in non-dominant and dominant languages.
Experiments show that MAPO stably achieves significant improvements in the multilingual reasoning of various models.
arXiv Detail & Related papers (2024-01-12T18:03:54Z) - Empowering Multi-step Reasoning across Languages via Tree-of-Thoughts [1.8175282137722093]
Chain-of-Thought (CoT) methods empower Large Language Models (LLMs) to solve complex tasks in a step-by-step manner.
The ability to deliver multi-step reasoning remains limited to English because of the imbalance in the distribution of pre-training data.
We propose Cross-lingual Tree-of-Thoughts (Cross-ToT), a method for aligning Cross-lingual CoT reasoning across languages.
arXiv Detail & Related papers (2023-11-14T11:49:43Z) - Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning
across Languages [46.496557448392494]
Chain-of-thought (CoT) is capable of eliciting models to explicitly generate reasoning paths.
Existing zero-shot prompting techniques are limited to a single language.
We introduce cross-lingual prompting (CLP), aiming to improve zero-shot CoT reasoning across languages.
arXiv Detail & Related papers (2023-10-23T10:56:03Z) - Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement [50.62461749446111]
Self-Polish (SP) is a novel method that facilitates the model's reasoning by guiding it to progressively refine the given problems to be more comprehensible and solvable.
SP is to all other prompting methods of answer/reasoning side like CoT, allowing for seamless integration with state-of-the-art techniques for further improvement.
arXiv Detail & Related papers (2023-05-23T19:58:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.