Related papers: Faithful Question Answering with Monte-Carlo Planning

Faithful Question Answering with Monte-Carlo Planning

URL: http://arxiv.org/abs/2305.02556v1
Date: Thu, 4 May 2023 05:21:36 GMT
Title: Faithful Question Answering with Monte-Carlo Planning
Authors: Ruixin Hong, Hongming Zhang, Hong Zhao, Dong Yu, Changshui Zhang
Abstract summary: We propose FAME (FAithful question answering with MontE-carlo planning) to answer questions based on faithful reasoning steps. We formulate the task as a discrete decision-making problem and solve it through the interaction of a reasoning environment and a controller. FAME achieves state-of-the-art performance on the standard benchmark.
Score: 78.02429369951363
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although large language models demonstrate remarkable question-answering performances, revealing the intermediate reasoning steps that the models faithfully follow remains challenging. In this paper, we propose FAME (FAithful question answering with MontE-carlo planning) to answer questions based on faithful reasoning steps. The reasoning steps are organized as a structured entailment tree, which shows how premises are used to produce intermediate conclusions that can prove the correctness of the answer. We formulate the task as a discrete decision-making problem and solve it through the interaction of a reasoning environment and a controller. The environment is modular and contains several basic task-oriented modules, while the controller proposes actions to assemble the modules. Since the search space could be large, we introduce a Monte-Carlo planning algorithm to do a look-ahead search and select actions that will eventually lead to high-quality steps. FAME achieves state-of-the-art performance on the standard benchmark. It can produce valid and faithful reasoning steps compared with large language models with a much smaller model size.

Related papers

Self-Steering Language Models [113.96916935955842]
DisCIPL is a method for "self-steering" language models. DisCIPL uses a Planner model to generate a task-specific inference program. Our work opens up a design space of highly-parallelized Monte Carlo inference strategies.
arXiv Detail & Related papers (2025-04-09T17:54:22Z)
SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models [4.328173053224842]
This paper introduces SQuARE, a novel prompting technique designed to improve reasoning through a self-interrogation paradigm. Building upon CoT frameworks, SQuARE prompts models to generate and resolve multiple auxiliary questions before tackling the main query. Our evaluations, conducted with Llama 3 and GPT-4o models across multiple question-answering datasets, demonstrate that SQuARE significantly surpasses traditional CoT prompts and existing rephrase-and-respond methods.
arXiv Detail & Related papers (2025-02-13T15:07:20Z)
Zero-Shot Multi-Hop Question Answering via Monte-Carlo Tree Search with Large Language Models [19.214387260667348]
This paper introduces Monte-Carlo tree search for Zero-shot multi-hop Question Answering (MZQA), a framework based on Monte-Carlo tree search (MCTS) Unlike previous works, we propose a zero-shot prompting method, which relies solely on instructions without the support of hand-crafted few-shot examples that typically require domain expertise. We also introduce a behavioral cloning approach (MZQA-BC) trained on self-generated MCTS inference trajectories, achieving an over 10-fold increase in reasoning speed with bare compromise in performance.
arXiv Detail & Related papers (2024-09-28T15:13:04Z)
STOC-TOT: Stochastic Tree-of-Thought with Constrained Decoding for Complex Reasoning in Multi-Hop Question Answering [8.525847131940031]
Multi-hop question answering (MHQA) requires a model to retrieve and integrate information from multiple passages to answer a complex question. Recent systems leverage the power of large language models and integrate evidence retrieval with reasoning prompts. We propose STOC-TOT, a tree-of-thought reasoning prompting method with constrained decoding for MHQA.
arXiv Detail & Related papers (2024-07-04T07:17:53Z)
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering [101.25249395748794]
This paper addresses the task of video question answering (videoQA) via a decomposed multi-stage, modular reasoning framework. Unlike traditional single-stage planning methods, we propose a multi-stage system consisting of an event, a grounding stage, and a final reasoning stage in conjunction with an external memory.
arXiv Detail & Related papers (2024-04-09T17:59:31Z)
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior? [75.79305790453654]
Coaxing out desired behavior from pretrained models, while avoiding undesirable ones, has redefined NLP. We argue for a systematic effort to decompose language model behavior into categories that explain cross-task performance.
arXiv Detail & Related papers (2023-07-31T22:58:41Z)
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning [23.34325378824462]
Large language models (LLMs) are difficult to verify the correctness and safety of their behavior. One approach is to prompt LLMs to externalize their reasoning, by having them generate step-by-step reasoning as they answer a question. This approach relies on the stated reasoning faithfully reflecting the model's actual reasoning, which is not always the case. Decomposition-based methods achieve strong performance on question-answering tasks, sometimes approaching that of CoT.
arXiv Detail & Related papers (2023-07-17T00:54:10Z)
Successive Prompting for Decomposing Complex Questions [50.00659445976735]
Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting. We introduce Successive Prompting'', where we iteratively break down a complex task into a simple task, solve it, and then repeat the process until we get the final solution. Our best model (with successive prompting) achieves an improvement of 5% absolute F1 on a few-shot version of the DROP dataset.
arXiv Detail & Related papers (2022-12-08T06:03:38Z)
Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning [14.663216851932646]
We show that language models tend to perform fairly well at single step inference tasks, but struggle to chain together multiple reasoning steps to solve more complex problems. We propose a Selection-Inference (SI) framework that exploits pre-trained LLMs as general processing modules. We show that a 7B parameter LLM used within the SI framework in a 5-shot generalisation setting, with no fine-tuning, yields a performance improvement of over 100%.
arXiv Detail & Related papers (2022-05-19T17:25:28Z)
METGEN: A Module-Based Entailment Tree Generation Framework for Answer Explanation [59.33241627273023]
We propose METGEN, a Module-based Entailment Tree GEN framework that has multiple modules and a reasoning controller. Given a question, METGEN can iteratively generate the entailment tree by conducting single-step entailment with separate modules and selecting the reasoning flow with the controller. Experiment results show that METGEN can outperform previous state-of-the-art models with only 9% of the parameters.
arXiv Detail & Related papers (2022-05-05T12:06:02Z)
Text Modular Networks: Learning to Decompose Tasks in the Language of Existing Models [61.480085460269514]
We propose a framework for building interpretable systems that learn to solve complex tasks by decomposing them into simpler ones solvable by existing models. We use this framework to build ModularQA, a system that can answer multi-hop reasoning questions by decomposing them into sub-questions answerable by a neural factoid single-span QA model and a symbolic calculator.
arXiv Detail & Related papers (2020-09-01T23:45:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.