Program of Thoughts Prompting: Disentangling Computation from Reasoning
for Numerical Reasoning Tasks
- URL: http://arxiv.org/abs/2211.12588v4
- Date: Mon, 23 Oct 2023 01:27:38 GMT
- Title: Program of Thoughts Prompting: Disentangling Computation from Reasoning
for Numerical Reasoning Tasks
- Authors: Wenhu Chen, Xueguang Ma, Xinyi Wang, William W. Cohen
- Abstract summary: Chain-of-thoughts prompting (CoT) is by far the state-of-art method for these tasks.
We propose Program of Thoughts' (PoT), which uses language models to express the reasoning process as a program.
PoT can show an average performance gain over CoT by around 12% across all the evaluated datasets.
- Score: 108.4568236569645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, there has been significant progress in teaching language models to
perform step-by-step reasoning to solve complex numerical reasoning tasks.
Chain-of-thoughts prompting (CoT) is by far the state-of-art method for these
tasks. CoT uses language models to perform both reasoning and computation in
the multi-step `thought' process. To disentangle computation from reasoning, we
propose `Program of Thoughts' (PoT), which uses language models (mainly Codex)
to express the reasoning process as a program. The computation is relegated to
an external computer, which executes the generated programs to derive the
answer. We evaluate PoT on five math word problem datasets (GSM, AQuA, SVAMP,
TabMWP, MultiArith) and three financial-QA datasets (FinQA, ConvFinQA, TATQA)
for both few-shot and zero-shot setups. Under both few-shot and zero-shot
settings, PoT can show an average performance gain over CoT by around 12\%
across all the evaluated datasets. By combining PoT with self-consistency
decoding, we can achieve SoTA performance on all math problem datasets and
near-SoTA performance on financial datasets. All of our data and code are
released in Github https://github.com/wenhuchen/Program-of-Thoughts
Related papers
- Learning to Reason via Program Generation, Emulation, and Search [33.11955431589091]
Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities.
Not all reasoning tasks are easily expressible as code, e.g. tasks involving commonsense reasoning, moral decision-making, and sarcasm understanding.
We propose Code Generation and Emulated EXecution (CoGEX) to extend an LM's program synthesis skills to such tasks.
arXiv Detail & Related papers (2024-05-25T19:40:50Z) - How Do Humans Write Code? Large Models Do It the Same Way Too [2.620171369243005]
Program-of-Thought (PoT) replaces natural language-based Chain-of-Thought (CoT) as the most popular method in Large Language Models.
Using PoT introduces more reasoning errors, such as incorrect formulas or flawed logic, compared to CoT.
We propose Human-Think Language (HTL), which leverages a suite of strategies that help integrate PoT and CoT.
arXiv Detail & Related papers (2024-02-24T05:40:01Z) - Design of Chain-of-Thought in Math Problem Solving [8.582686316167973]
Chain-of-Thought (CoT) plays a crucial role in reasoning for math problem solving.
We compare conventional natural language CoT with various program CoTs, including the self-describing program, the comment-describing program, and the non-describing program.
We find that program CoTs often have superior effectiveness in math problem solving.
arXiv Detail & Related papers (2023-09-20T04:17:28Z) - Evaluating and Improving Tool-Augmented Computation-Intensive Math
Reasoning [75.74103236299477]
Chain-of-thought prompting(CoT) and tool augmentation have been validated as effective practices for improving large language models.
We propose a new approach that can deliberate the reasoning steps with tool interfaces, namely textbfDELI.
Experimental results on CARP and six other datasets show that the proposed DELI mostly outperforms competitive baselines.
arXiv Detail & Related papers (2023-06-04T17:02:59Z) - SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs)
We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer.
We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z) - Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning
by Large Language Models [23.805926737723603]
A few manually crafted step-by-step reasoning demonstrations can be used to generate reasoning steps for large language models (LLMs)
Zero-shot-CoTs prompts the target problem statement with "Let's think step by step" as an input prompt to LLMs.
We show that our proposed zero-shot prompting consistently outperforms Zero-shot-CoT across all datasets by a large margin.
arXiv Detail & Related papers (2023-05-06T16:34:37Z) - NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual
Question Answering [52.10214317661547]
Current numerical reasoning methods autoregressively decode program sequences.
The accuracy of program generation drops sharply as the decoding steps unfold due to error propagation.
In this paper, we propose a non-autoregressive program generation framework.
arXiv Detail & Related papers (2022-11-07T11:25:21Z) - ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational
Finance Question Answering [70.6359636116848]
We propose a new large-scale dataset, ConvFinQA, to study the chain of numerical reasoning in conversational question answering.
Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations.
arXiv Detail & Related papers (2022-10-07T23:48:50Z) - Dynamic Prompt Learning via Policy Gradient for Semi-structured
Mathematical Reasoning [150.17907456113537]
We present Tabular Math Word Problems (TabMWP), a new dataset containing 38,431 grade-level problems that require mathematical reasoning.
We evaluate different pre-trained models on TabMWP, including the GPT-3 model in a few-shot setting.
We propose a novel approach, PromptPG, which utilizes policy gradient to learn to select in-context examples from a small amount of training data.
arXiv Detail & Related papers (2022-09-29T08:01:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.