Self-prompted Chain-of-Thought on Large Language Models for Open-domain
Multi-hop Reasoning
- URL: http://arxiv.org/abs/2310.13552v2
- Date: Mon, 23 Oct 2023 05:42:42 GMT
- Title: Self-prompted Chain-of-Thought on Large Language Models for Open-domain
Multi-hop Reasoning
- Authors: Jinyuan Wang and Junlong Li and Hai Zhao
- Abstract summary: In open-domain question-answering (ODQA), most existing questions require single-hop reasoning on commonsense.
Large language models (LLMs) have found significant utility in facilitating ODQA without external corpus.
We propose Self-prompted Chain-of-Thought (SP-CoT), an automated framework to mass-produce high quality CoTs.
- Score: 70.74928578278957
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In open-domain question-answering (ODQA), most existing questions require
single-hop reasoning on commonsense. To further extend this task, we officially
introduce open-domain multi-hop reasoning (ODMR) by answering multi-hop
questions with explicit reasoning steps in open-domain setting. Recently, large
language models (LLMs) have found significant utility in facilitating ODQA
without external corpus. Furthermore, chain-of-thought (CoT) prompting boosts
the reasoning capability of LLMs to a greater extent with manual or automated
paradigms. However, existing automated methods lack of quality assurance, while
manual approaches suffer from limited scalability and poor diversity, hindering
the capabilities of LLMs. In this paper, we propose Self-prompted
Chain-of-Thought (SP-CoT), an automated framework to mass-produce high quality
CoTs of LLMs, by LLMs and for LLMs. SP-CoT introduces an automated generation
pipeline of high quality ODMR datasets, an adaptive sampler for in-context CoT
selection and self-prompted inference via in-context learning. Extensive
experiments on four multi-hop question-answering benchmarks show that our
proposed SP-CoT not only significantly surpasses the previous SOTA methods on
large-scale (175B) LLMs, but also nearly doubles the zero-shot performance of
small-scale (13B) LLMs. Further analysis reveals the remarkable capability of
SP-CoT to elicit direct and concise intermediate reasoning steps by recalling
$\sim$50\% of intermediate answers on MuSiQue-Ans dataset.
Related papers
- Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models [8.255272009912417]
We propose a novel prompt approach for automatic reasoning named textbfLBS3, inspired by curriculum learning.
LBS3 steers LLMs to recall easy-to-hard proxy queries that are pertinent to the target query.
It invokes a progressive strategy that utilizes exemplary prompts stemmed from easy-proxy queries to direct LLMs in solving hard-proxy queries.
arXiv Detail & Related papers (2024-10-29T04:28:49Z) - SG-FSM: A Self-Guiding Zero-Shot Prompting Paradigm for Multi-Hop Question Answering Based on Finite State Machine [27.274219226254026]
Multi-hop Question Answering (MHQA) remains challenging for many existing models.
We propose the Self-Guiding prompting Finite State Machine (SG-FSM) to strengthen multi-hop reasoning abilities.
arXiv Detail & Related papers (2024-10-22T13:47:38Z) - Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift.
We devise a series of experiments to empirically explain the performance gap.
arXiv Detail & Related papers (2024-09-27T05:06:43Z) - Leveraging LLMs for Dialogue Quality Measurement [27.046917937460798]
Large language models (LLMs) show robust zeroshot and few-shot capabilities across NLP tasks.
Manipulating factors such as model size, in-context examples, and selection techniques, we examine "chain-of-thought" (CoT) reasoning and label extraction procedures.
Our results indicate that LLMs that are suitably fine-tuned and have sufficient reasoning capabilities can be leveraged for automated dialogue evaluation.
arXiv Detail & Related papers (2024-06-25T06:19:47Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization [19.200989737492595]
Large language models (LLMs) have shown great progress in responding to user questions.
The quality of LLM outputs heavily depends on the prompt design, where a good prompt might enable the LLM to answer a very challenging question correctly.
We propose a hierarchy of LLMs, first constructing a prompt with precise instructions and accurate wording in a hierarchical manner, and then using this prompt to generate the final answer to the user query.
arXiv Detail & Related papers (2024-05-30T17:05:45Z) - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA)
We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity.
We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z) - A Self-enhancement Approach for Domain-specific Chatbot Training via
Knowledge Mining and Digest [62.63606958140248]
Large Language Models (LLMs) often encounter challenges when dealing with intricate and knowledge-demanding queries in specific domains.
This paper introduces a novel approach to enhance LLMs by effectively extracting the relevant knowledge from domain-specific textual sources.
We train a knowledge miner, namely LLMiner, which autonomously extracts Question-Answer pairs from relevant documents.
arXiv Detail & Related papers (2023-11-17T16:09:10Z) - FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models [79.62191017182518]
FollowBench is a benchmark for Fine-grained Constraints Following Benchmark for Large Language Models.
We introduce a Multi-level mechanism that incrementally adds a single constraint to the initial instruction at each increased level.
By evaluating 13 popular LLMs on FollowBench, we highlight the weaknesses of LLMs in instruction following and point towards potential avenues for future work.
arXiv Detail & Related papers (2023-10-31T12:32:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.