Design of Chain-of-Thought in Math Problem Solving
- URL: http://arxiv.org/abs/2309.11054v2
- Date: Sat, 30 Sep 2023 15:51:38 GMT
- Title: Design of Chain-of-Thought in Math Problem Solving
- Authors: Zhanming Jie, Trung Quoc Luong, Xinbo Zhang, Xiaoran Jin, Hang Li
- Abstract summary: Chain-of-Thought (CoT) plays a crucial role in reasoning for math problem solving.
We compare conventional natural language CoT with various program CoTs, including the self-describing program, the comment-describing program, and the non-describing program.
We find that program CoTs often have superior effectiveness in math problem solving.
- Score: 8.582686316167973
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Chain-of-Thought (CoT) plays a crucial role in reasoning for math problem
solving. We conduct a comprehensive examination of methods for designing CoT,
comparing conventional natural language CoT with various program CoTs,
including the self-describing program, the comment-describing program, and the
non-describing program. Furthermore, we investigate the impact of programming
language on program CoTs, comparing Python and Wolfram Language. Through
extensive experiments on GSM8K, MATHQA, and SVAMP, we find that program CoTs
often have superior effectiveness in math problem solving. Notably, the best
performing combination with 30B parameters beats GPT-3.5-turbo by a significant
margin. The results show that self-describing program offers greater diversity
and thus can generally achieve higher performance. We also find that Python is
a better choice of language than Wolfram for program CoTs. The experimental
results provide a valuable guideline for future CoT designs that take into
account both programming language and coding style for further advancements.
Our datasets and code are publicly available.
Related papers
- To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning [55.52872152909785]
Chain-of-thought (CoT) via prompting is the de facto method for eliciting reasoning capabilities from large language models (LLMs)
We show that CoT gives strong performance benefits primarily on tasks involving math or logic, with much smaller gains on other types of tasks.
arXiv Detail & Related papers (2024-09-18T17:55:00Z) - Learning to Reason via Program Generation, Emulation, and Search [33.11955431589091]
Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities.
Not all reasoning tasks are easily expressible as code, e.g. tasks involving commonsense reasoning, moral decision-making, and sarcasm understanding.
We propose Code Generation and Emulated EXecution (CoGEX) to extend an LM's program synthesis skills to such tasks.
arXiv Detail & Related papers (2024-05-25T19:40:50Z) - Python is Not Always the Best Choice: Embracing Multilingual Program of Thoughts [51.49688654641581]
We propose a task and model agnostic approach called MultiPoT, which harnesses strength and diversity from various languages.
Experimental results reveal that it significantly outperforms Python Self-Consistency.
In particular, MultiPoT achieves more than 4.6% improvement on average on ChatGPT (gpt-3.5-turbo-0701)
arXiv Detail & Related papers (2024-02-16T13:48:06Z) - Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning [84.12154024070024]
We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks.
Our approach prompts a language model to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge.
A Python interpreter then executes the generated code and prints the output.
arXiv Detail & Related papers (2023-09-19T17:54:21Z) - Understanding Programs by Exploiting (Fuzzing) Test Cases [26.8259045248779]
We propose to incorporate the relationship between inputs and possible outputs/behaviors into learning, for achieving a deeper semantic understanding of programs.
To obtain inputs that are representative enough to trigger the execution of most part of the code, we resort to fuzz testing and propose fuzz tuning.
The effectiveness of the proposed method is verified on two program understanding tasks including code clone detection and code classification, and it outperforms current state-of-the-arts by large margins.
arXiv Detail & Related papers (2023-05-23T01:51:46Z) - Program of Thoughts Prompting: Disentangling Computation from Reasoning
for Numerical Reasoning Tasks [108.4568236569645]
Chain-of-thoughts prompting (CoT) is by far the state-of-art method for these tasks.
We propose Program of Thoughts' (PoT), which uses language models to express the reasoning process as a program.
PoT can show an average performance gain over CoT by around 12% across all the evaluated datasets.
arXiv Detail & Related papers (2022-11-22T21:06:00Z) - NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual
Question Answering [52.10214317661547]
Current numerical reasoning methods autoregressively decode program sequences.
The accuracy of program generation drops sharply as the decoding steps unfold due to error propagation.
In this paper, we propose a non-autoregressive program generation framework.
arXiv Detail & Related papers (2022-11-07T11:25:21Z) - Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection.
We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z) - Learning Differentiable Programs with Admissible Neural Heuristics [43.54820901841979]
We study the problem of learning differentiable functions expressed as programs in a domain-specific language.
We frame this optimization problem as a search in a weighted graph whose paths encode top-down derivations of program syntax.
Our key innovation is to view various classes of neural networks as continuous relaxations over the space of programs.
arXiv Detail & Related papers (2020-07-23T16:07:39Z) - Quantifying the Impact on Software Complexity of Composable Inductive
Programming using Zoea [0.0]
Composable inductive programming as implemented in the Zoea programming language is a simple declarative approach to software development.
This paper presents the results of a quantitative comparison of the software complexity of equivalent code implemented in Zoea and also in a conventional programming language.
arXiv Detail & Related papers (2020-05-17T10:44:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.