Least-to-Most Prompting Enables Complex Reasoning in Large Language
Models
- URL: http://arxiv.org/abs/2205.10625v3
- Date: Sun, 16 Apr 2023 22:08:08 GMT
- Title: Least-to-Most Prompting Enables Complex Reasoning in Large Language
Models
- Authors: Denny Zhou, Nathanael Sch\"arli, Le Hou, Jason Wei, Nathan Scales,
Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, Ed Chi
- Abstract summary: We propose a novel prompting strategy, least-to-most prompting, to overcome the challenge of easy-to-hard generalization.
We show that least-to-most prompting is capable of generalizing to more difficult problems than those seen in prompts.
neural-symbolic models in the literature that specialize in solving SCAN are trained on the entire training set containing over 15,000 examples.
- Score: 52.59923418570378
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chain-of-thought prompting has demonstrated remarkable performance on various
natural language reasoning tasks. However, it tends to perform poorly on tasks
which requires solving problems harder than the exemplars shown in the prompts.
To overcome this challenge of easy-to-hard generalization, we propose a novel
prompting strategy, least-to-most prompting. The key idea in this strategy is
to break down a complex problem into a series of simpler subproblems and then
solve them in sequence. Solving each subproblem is facilitated by the answers
to previously solved subproblems. Our experimental results on tasks related to
symbolic manipulation, compositional generalization, and math reasoning reveal
that least-to-most prompting is capable of generalizing to more difficult
problems than those seen in the prompts. A notable finding is that when the
GPT-3 code-davinci-002 model is used with least-to-most prompting, it can solve
the compositional generalization benchmark SCAN in any split (including length
split) with an accuracy of at least 99% using just 14 exemplars, compared to
only 16% accuracy with chain-of-thought prompting. This is particularly
noteworthy because neural-symbolic models in the literature that specialize in
solving SCAN are trained on the entire training set containing over 15,000
examples. We have included prompts for all the tasks in the Appendix.
Related papers
- Chain of Thoughtlessness? An Analysis of CoT in Planning [17.329365493094542]
Large language model (LLM) performance on reasoning problems typically does not generalize out of distribution.
This paper presents a case study of chain of thought on problems from Blocksworld, a classical planning domain.
We find meaningful performance improvements from chain of thought prompts when those prompts are exceedingly specific to their problem class.
arXiv Detail & Related papers (2024-05-08T02:48:28Z) - An Examination on the Effectiveness of Divide-and-Conquer Prompting in Large Language Models [28.139780691709266]
We provide a theoretic analysis to divide-and-conquer prompting strategy and help us identify the specific tasks where DaC prompting can bring performance boost with theoretic guarantee.
We present two cases (large integer arithmetic and fact verification) where experimental results align with our theoretic analysis.
arXiv Detail & Related papers (2024-02-08T02:37:30Z) - A Hybrid System for Systematic Generalization in Simple Arithmetic
Problems [70.91780996370326]
We propose a hybrid system capable of solving arithmetic problems that require compositional and systematic reasoning over sequences of symbols.
We show that the proposed system can accurately solve nested arithmetical expressions even when trained only on a subset including the simplest cases.
arXiv Detail & Related papers (2023-06-29T18:35:41Z) - Faith and Fate: Limits of Transformers on Compositionality [109.79516190693415]
We investigate the limits of transformer large language models across three representative compositional tasks.
These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer.
Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching.
arXiv Detail & Related papers (2023-05-29T23:24:14Z) - Successive Prompting for Decomposing Complex Questions [50.00659445976735]
Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting.
We introduce Successive Prompting'', where we iteratively break down a complex task into a simple task, solve it, and then repeat the process until we get the final solution.
Our best model (with successive prompting) achieves an improvement of 5% absolute F1 on a few-shot version of the DROP dataset.
arXiv Detail & Related papers (2022-12-08T06:03:38Z) - PAL: Program-aided Language Models [112.94785609781503]
We present Program-Aided Language models (PaL) to understand natural language problems.
PaL offloads the solution step to a programmatic runtime such as a Python interpreter.
We set new state-of-the-art results in all 12 benchmarks.
arXiv Detail & Related papers (2022-11-18T18:56:13Z) - Complexity-Based Prompting for Multi-Step Reasoning [72.0057198610614]
We study the task of prompting large-scale language models to perform multi-step reasoning.
A central question is which reasoning examples make the most effective prompts.
We propose complexity-based prompting, a simple and effective example selection scheme for multi-step reasoning.
arXiv Detail & Related papers (2022-10-03T05:33:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.