ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large
Language Models
- URL: http://arxiv.org/abs/2305.14323v3
- Date: Mon, 6 Nov 2023 11:00:26 GMT
- Title: ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large
Language Models
- Authors: Zhipeng Chen, Kun Zhou, Beichen Zhang, Zheng Gong, Wayne Xin Zhao and
Ji-Rong Wen
- Abstract summary: We propose ChatCoT, a tool-augmented chain-of-thought reasoning framework for chat-based LLMs.
In ChatCoT, we model the chain-of-thought (CoT) reasoning as multi-turn conversations, to utilize tools in a more natural way through chatting.
Our approach can effectively leverage the multi-turn conversation ability of chat-based LLMs, and integrate the thought chain following and tools manipulation in a unified way.
- Score: 125.7209927536255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although large language models (LLMs) have achieved excellent performance in
a variety of evaluation benchmarks, they still struggle in complex reasoning
tasks which require specific knowledge and multi-hop reasoning. To improve the
reasoning abilities, we propose ChatCoT, a tool-augmented chain-of-thought
reasoning framework for chat-based LLMs (e.g., ChatGPT). In ChatCoT, we model
the chain-of-thought (CoT) reasoning as multi-turn conversations, to utilize
tools in a more natural way through chatting. At each turn, LLMs can either
interact with tools or perform the reasoning. Our approach can effectively
leverage the multi-turn conversation ability of chat-based LLMs, and integrate
the thought chain following and tools manipulation in a unified way. Specially,
we initialize the early turns of the conversation by the knowledge about tools,
tasks, and reasoning format, and propose an iterative tool-augmented reasoning
step to perform step-by-step tool-augmented reasoning. The experiment results
on two complex reasoning datasets (MATH and HotpotQA) have shown the
effectiveness of ChatCoT on complex reasoning tasks, achieving a 7.9% relative
improvement over the state-of-the-art baseline. Our code and data are available
at: \url{https://github.com/RUCAIBOX/ChatCoT}.
Related papers
- Markov Chain of Thought for Efficient Mathematical Reasoning [10.678633785012691]
Chain of Thought (CoT) of multi-step benefits from the logical structure of the reasoning steps and task-specific actions.
We conceptualize the standard multi-step CoT as a novel Markov Chain of Thought (MCoT)
arXiv Detail & Related papers (2024-10-23T07:53:29Z) - ChatLogic: Integrating Logic Programming with Large Language Models for Multi-Step Reasoning [15.468435593587808]
This paper introduces ChatLogic, a framework specifically targeted at reasoning tasks.
In ChatLogic, the language model plays a central role, acting as a controller and participating in every system operation stage.
We propose a novel method of converting logic problems into symbolic integration with an inference engine.
arXiv Detail & Related papers (2024-07-14T11:06:43Z) - PerkwE_COQA: Enhanced Persian Conversational Question Answering by combining contextual keyword extraction with Large Language Models [0.8057006406834466]
This paper presents a novel method to elevate the performance of Persian Conversational question-answering (CQA) systems.
It combines the strengths of Large Language Models (LLMs) with contextual keyword extraction.
The proposed method effectively handles implicit questions, delivers contextually relevant answers, and tackles complex questions that rely heavily on conversational context.
arXiv Detail & Related papers (2024-04-08T11:14:58Z) - Reasoning in Conversation: Solving Subjective Tasks through Dialogue
Simulation for Large Language Models [56.93074140619464]
We propose RiC (Reasoning in Conversation), a method that focuses on solving subjective tasks through dialogue simulation.
The motivation of RiC is to mine useful contextual information by simulating dialogues instead of supplying chain-of-thought style rationales.
We evaluate both API-based and open-source LLMs including GPT-4, ChatGPT, and OpenChat across twelve tasks.
arXiv Detail & Related papers (2024-02-27T05:37:10Z) - Efficient Tool Use with Chain-of-Abstraction Reasoning [65.18096363216574]
Large language models (LLMs) need to ground their reasoning to real-world knowledge.
There remains challenges for fine-tuning LLM agents to invoke tools in multi-step reasoning problems.
We propose a new method for LLMs to better leverage tools in multi-step reasoning.
arXiv Detail & Related papers (2024-01-30T21:53:30Z) - Think Before You Speak: Cultivating Communication Skills of Large Language Models via Inner Monologue [73.69510478736483]
Large language models (LLMs) can generate fluent, coherent, and diverse responses.
However, they lack a crucial ability: communication skills.
This article aims to empower LLMs with communication skills through inner monologues.
Experimental results show that the proposed CSIM strategy improves the backbone models and outperforms the baselines.
arXiv Detail & Related papers (2023-11-13T16:19:42Z) - ChatABL: Abductive Learning via Natural Language Interaction with
ChatGPT [72.83383437501577]
Large language models (LLMs) have recently demonstrated significant potential in mathematical abilities.
LLMs currently have difficulty in bridging perception, language understanding and reasoning capabilities.
This paper presents a novel method for integrating LLMs into the abductive learning framework.
arXiv Detail & Related papers (2023-04-21T16:23:47Z) - A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on
Reasoning, Hallucination, and Interactivity [79.12003701981092]
We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks.
We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset.
ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning.
arXiv Detail & Related papers (2023-02-08T12:35:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.