From Good to Great: Improving Math Reasoning with Tool-Augmented
Interleaf Prompting
- URL: http://arxiv.org/abs/2401.05384v1
- Date: Mon, 18 Dec 2023 06:31:23 GMT
- Title: From Good to Great: Improving Math Reasoning with Tool-Augmented
Interleaf Prompting
- Authors: Nuo Chen, Hongguang Li, Baoyuan Wang, Jia Li
- Abstract summary: IMP-TIP: Improving Math Reasoning with Tool-augmented Interleaf Prompting.
We introduce IMP-TIP: Improving Math Reasoning with Tool-augmented Interleaf Prompting.
- Score: 45.77084082197953
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates the performance of Large Language Models (LLMs) and
Tool-augmented LLMs in tackling complex mathematical reasoning tasks. We
introduce IMP-TIP: Improving Math Reasoning with Tool-augmented Interleaf
Prompting, a framework that combines the strengths of both LLMs and
Tool-augmented LLMs. IMP-TIP follows the ``From Good to Great" concept,
collecting multiple potential solutions from both LLMs and their Tool-Augmented
counterparts for the same math problem, and then selecting or re-generating the
most accurate answer after cross-checking these solutions via tool-augmented
interleaf prompting. The framework incorporates two key aspects: self-prompt
and tool-augmented interleaf prompting (TIP). The former allows LLMs to
autonomously refine and improve an initial prompt related to tool usage, while
the latter enables LLMs to derive the final answer by dynamically analyzing the
problem, cross-checking potential solutions, and revising previous reasoning
hints in an interleaved manner. Experimental analysis shows that IMP-TIP
achieves enhanced mathematical capabilities and outperforms traditional LLMs
and tool-augmented LLMs in accuracy and reasoning diversity on math reasoning
tasks. For instance, IMP-TIP can improve Tool-augmented ChatGPT on GSM8K-Hard
from 56.0% to 65.2%.
Related papers
- Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models [26.28459880766842]
We propose a decision-aware and generalizable tool-usage framework (DEER)
Specifically, we first construct the tool-usage samples with multiple decision branches via an automatic generation pipeline.
Our proposed DEER is effective and significantly outperforms baselines across various datasets.
arXiv Detail & Related papers (2024-02-26T16:11:03Z) - Efficient Tool Use with Chain-of-Abstraction Reasoning [65.18096363216574]
Large language models (LLMs) need to ground their reasoning to real-world knowledge.
There remains challenges for fine-tuning LLM agents to invoke tools in multi-step reasoning problems.
We propose a new method for LLMs to better leverage tools in multi-step reasoning.
arXiv Detail & Related papers (2024-01-30T21:53:30Z) - Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs.
We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer.
This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z) - CRAFT: Customizing LLMs by Creating and Retrieving from Specialized
Toolsets [75.64181719386497]
We present CRAFT, a tool creation and retrieval framework for large language models (LLMs)
It creates toolsets specifically curated for the tasks and equips LLMs with a component that retrieves tools from these sets to enhance their capability to solve complex tasks.
Our method is designed to be flexible and offers a plug-and-play approach to adapt off-the-shelf LLMs to unseen domains and modalities, without any finetuning.
arXiv Detail & Related papers (2023-09-29T17:40:26Z) - Multimodal Multi-Hop Question Answering Through a Conversation Between
Tools and Efficiently Finetuned Large Language Models [20.52053559484399]
We employ a tool-interacting divide-and-conquer strategy to answer complex multi-hop questions.
To increase the reasoning ability of LLMs, we prompt chatGPT to generate a tool-interacting divide-and-conquer dataset.
To assess the effectiveness of this approach, we conduct an evaluation on two recently introduced complex question-answering datasets.
arXiv Detail & Related papers (2023-09-16T08:22:22Z) - CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models [74.22729793816451]
Large Language Models (LLMs) have made significant progress in utilizing tools, but their ability is limited by API availability.
We propose CREATOR, a novel framework that enables LLMs to create their own tools using documentation and code realization.
We evaluate CREATOR on MATH and TabMWP benchmarks, respectively consisting of challenging math competition problems.
arXiv Detail & Related papers (2023-05-23T17:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.