Related papers: Zero-Shot Multi-Hop Question Answering via Monte-Carlo Tree Search with Large Language Models

Zero-Shot Multi-Hop Question Answering via Monte-Carlo Tree Search with Large Language Models

URL: http://arxiv.org/abs/2409.19382v2
Date: Tue, 1 Oct 2024 10:28:32 GMT
Title: Zero-Shot Multi-Hop Question Answering via Monte-Carlo Tree Search with Large Language Models
Authors: Seongmin Lee, Jaewook Shin, Youngjin Ahn, Seokin Seo, Ohjoon Kwon, Kee-Eung Kim,
Abstract summary: This paper introduces Monte-Carlo tree search for Zero-shot multi-hop Question Answering (MZQA), a framework based on Monte-Carlo tree search (MCTS) Unlike previous works, we propose a zero-shot prompting method, which relies solely on instructions without the support of hand-crafted few-shot examples that typically require domain expertise. We also introduce a behavioral cloning approach (MZQA-BC) trained on self-generated MCTS inference trajectories, achieving an over 10-fold increase in reasoning speed with bare compromise in performance.
Score: 19.214387260667348
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in large language models (LLMs) have significantly impacted the domain of multi-hop question answering (MHQA), where systems are required to aggregate information and infer answers from disparate pieces of text. However, the autoregressive nature of LLMs inherently poses a challenge as errors may accumulate if mistakes are made in the intermediate reasoning steps. This paper introduces Monte-Carlo tree search for Zero-shot multi-hop Question Answering (MZQA), a framework based on Monte-Carlo tree search (MCTS) to identify optimal reasoning paths in MHQA tasks, mitigating the error propagation from sequential reasoning processes. Unlike previous works, we propose a zero-shot prompting method, which relies solely on instructions without the support of hand-crafted few-shot examples that typically require domain expertise. We also introduce a behavioral cloning approach (MZQA-BC) trained on self-generated MCTS inference trajectories, achieving an over 10-fold increase in reasoning speed with bare compromise in performance. The efficacy of our method is validated on standard benchmarks such as HotpotQA, 2WikiMultihopQA, and MuSiQue, demonstrating that it outperforms existing frameworks.

Related papers

Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering [78.89231943329885]
One of the most widely used tasks to evaluate Large Language Models (LLMs) is Multiple-Choice Question Answering (MCQA) In this work, we shed light on the inconsistencies of MCQA evaluation strategies, which can lead to inaccurate and misleading model comparisons.
arXiv Detail & Related papers (2025-03-19T08:45:03Z)
SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models [4.328173053224842]
This paper introduces SQuARE, a novel prompting technique designed to improve reasoning through a self-interrogation paradigm. Building upon CoT frameworks, SQuARE prompts models to generate and resolve multiple auxiliary questions before tackling the main query. Our evaluations, conducted with Llama 3 and GPT-4o models across multiple question-answering datasets, demonstrate that SQuARE significantly surpasses traditional CoT prompts and existing rephrase-and-respond methods.
arXiv Detail & Related papers (2025-02-13T15:07:20Z)
Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation [68.58373854950294]
We focus on causal reasoning and address the task of establishing causal relationships based on correlation information. We introduce a prompting strategy for this problem that breaks the original task into fixed subquestions. We evaluate our approach on an existing causal benchmark, Corr2Cause.
arXiv Detail & Related papers (2024-12-18T15:32:27Z)
SG-FSM: A Self-Guiding Zero-Shot Prompting Paradigm for Multi-Hop Question Answering Based on Finite State Machine [27.274219226254026]
Multi-hop Question Answering (MHQA) remains challenging for many existing models. We propose the Self-Guiding prompting Finite State Machine (SG-FSM) to strengthen multi-hop reasoning abilities.
arXiv Detail & Related papers (2024-10-22T13:47:38Z)
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks. We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model. Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z)
STOC-TOT: Stochastic Tree-of-Thought with Constrained Decoding for Complex Reasoning in Multi-Hop Question Answering [8.525847131940031]
Multi-hop question answering (MHQA) requires a model to retrieve and integrate information from multiple passages to answer a complex question. Recent systems leverage the power of large language models and integrate evidence retrieval with reasoning prompts. We propose STOC-TOT, a tree-of-thought reasoning prompting method with constrained decoding for MHQA.
arXiv Detail & Related papers (2024-07-04T07:17:53Z)
FSM: A Finite State Machine Based Zero-Shot Prompting Paradigm for Multi-Hop Question Answering [26.398873686905063]
Large Language Models (LLMs) with chain-of-thought (COT) prompting have demonstrated impressive abilities on simple nature language inference tasks. We propose a prompting method, Finite State Machine (FSM) to enhance the reasoning capabilities of LLM for complex tasks.
arXiv Detail & Related papers (2024-07-03T10:01:01Z)
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models [84.15513004135576]
Current research enhances the reasoning performance of Large Language Models (LLMs) by sampling multiple reasoning chains and ensembling based on the answer frequency. This approach fails in scenarios where the correct answers are in the minority. We introduce a hierarchical reasoning aggregation framework AoR, which selects answers based on the evaluation of reasoning chains.
arXiv Detail & Related papers (2024-05-21T17:12:19Z)
PokeMQA: Programmable knowledge editing for Multi-hop Question Answering [46.80110170981976]
Multi-hop question answering (MQA) is one of the challenging tasks to evaluate machine's comprehension and reasoning abilities. We propose a framework, Programmable knowledge editing for Multi-hop Question Answering (MQA) Specifically, we prompt LLMs to decompose knowledge-augmented multi-hop question, while interacting with a detached trainable scope detector to modulate LLMs behavior depending on external conflict signal.
arXiv Detail & Related papers (2023-12-23T08:32:13Z)
Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning [70.74928578278957]
In open-domain question-answering (ODQA), most existing questions require single-hop reasoning on commonsense. Large language models (LLMs) have found significant utility in facilitating ODQA without external corpus. We propose Self-prompted Chain-of-Thought (SP-CoT), an automated framework to mass-produce high quality CoTs.
arXiv Detail & Related papers (2023-10-20T14:51:10Z)
Faithful Question Answering with Monte-Carlo Planning [78.02429369951363]
We propose FAME (FAithful question answering with MontE-carlo planning) to answer questions based on faithful reasoning steps. We formulate the task as a discrete decision-making problem and solve it through the interaction of a reasoning environment and a controller. FAME achieves state-of-the-art performance on the standard benchmark.
arXiv Detail & Related papers (2023-05-04T05:21:36Z)
UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question. We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z)
RoMQA: A Benchmark for Robust, Multi-evidence, Multi-answer Question Answering [87.18962441714976]
We introduce RoMQA, the first benchmark for robust, multi-evidence, multi-answer question answering (QA) We evaluate state-of-the-art large language models in zero-shot, few-shot, and fine-tuning settings, and find that RoMQA is challenging. Our results show that RoMQA is a challenging benchmark for large language models, and provides a quantifiable test to build more robust QA methods.
arXiv Detail & Related papers (2022-10-25T21:39:36Z)
KECP: Knowledge Enhanced Contrastive Prompting for Few-shot Extractive Question Answering [28.18555591429343]
We propose a novel framework named Knowledge Enhanced Contrastive Prompt-tuning (KECP) Instead of adding pointer heads to PLMs, we transform the task into a non-autoregressive Masked Language Modeling (MLM) generation problem. Our method consistently outperforms state-of-the-art approaches in few-shot settings by a large margin.
arXiv Detail & Related papers (2022-05-06T08:31:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.