FLARE: Faithful Logic-Aided Reasoning and Exploration
- URL: http://arxiv.org/abs/2410.11900v2
- Date: Sat, 19 Oct 2024 00:05:17 GMT
- Title: FLARE: Faithful Logic-Aided Reasoning and Exploration
- Authors: Erik Arakelyan, Pasquale Minervini, Pat Verga, Patrick Lewis, Isabelle Augenstein,
- Abstract summary: We introduce a novel approach for traversing the problem space using task decompositions.
We use the Large Language Models to plan a solution, soft-formalise the query into facts and predicates using a logic programming code.
Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers.
- Score: 50.9814063216852
- License:
- Abstract: Modern Question Answering (QA) and Reasoning approaches based on Large Language Models (LLMs) commonly use prompting techniques, such as Chain-of-Thought (CoT), assuming the resulting generation will have a more granular exploration and reasoning over the question space and scope. However, such methods struggle with generating outputs that are faithful to the intermediate chain of reasoning produced by the model. On the other end of the spectrum, neuro-symbolic methods such as Faithful CoT (F-CoT) propose to combine LLMs with external symbolic solvers. While such approaches boast a high degree of faithfulness, they usually require a model trained for code generation and struggle with tasks that are ambiguous or hard to formalise strictly. We introduce $\textbf{F}$aithful $\textbf{L}$ogic-$\textbf{A}$ided $\textbf{R}$easoning and $\textbf{E}$xploration ($\textbf{FLARE}$), a novel interpretable approach for traversing the problem space using task decompositions. We use the LLM to plan a solution, soft-formalise the query into facts and predicates using a logic programming code and simulate that code execution using an exhaustive multi-hop search over the defined space. Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers. Our methods achieve SOTA results on $\mathbf{7}$ out of $\mathbf{9}$ diverse reasoning benchmarks. We also show that model faithfulness positively correlates with overall performance and further demonstrate that $\textbf{FLARE}$ allows pinpointing the decisive factors sufficient for and leading to the correct answer with optimal reasoning during the multi-hop search.
Related papers
- Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus [13.276829763453433]
Large language models (LLMs) are capable of solving a wide range of tasks, yet they have struggled with reasoning.
We propose $textbfAdditional Logic Training (ALT)$, which aims to enhance LLMs' reasoning capabilities by program-generated logical reasoning samples.
arXiv Detail & Related papers (2024-11-19T13:31:53Z) - To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning [55.52872152909785]
Chain-of-thought (CoT) via prompting is the de facto method for eliciting reasoning capabilities from large language models (LLMs)
We show that CoT gives strong performance benefits primarily on tasks involving math or logic, with much smaller gains on other types of tasks.
arXiv Detail & Related papers (2024-09-18T17:55:00Z) - Great Memory, Shallow Reasoning: Limits of $k$NN-LMs [71.73611113995143]
$k$NN-LMs, which integrate retrieval with next-word prediction, have demonstrated strong performance in language modeling.
We ask whether this improved ability to recall information really translates into downstream abilities.
arXiv Detail & Related papers (2024-08-21T17:59:05Z) - Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning [15.46907000938726]
We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL)
We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC.
We evaluate our proposed method on multiple parallel RL environments, including a deep exploration problem (textiti.e., $N$-chain), a video game, and a real-world problem in energy systems.
arXiv Detail & Related papers (2024-04-16T17:01:38Z) - Can Large Language Models Play Games? A Case Study of A Self-Play
Approach [61.15761840203145]
Large Language Models (LLMs) harness extensive data from the Internet, storing a broad spectrum of prior knowledge.
Monte-Carlo Tree Search (MCTS) is a search algorithm that provides reliable decision-making solutions.
This work introduces an innovative approach that bolsters LLMs with MCTS self-play to efficiently resolve turn-based zero-sum games.
arXiv Detail & Related papers (2024-03-08T19:16:29Z) - Reasoning with Language Model is Planning with World Model [27.24144881796878]
Large language models (LLMs) have shown remarkable reasoning capabilities.
LLMs lack an internal $textitworld model$ to predict the world.
We propose a new LLM reasoning framework, $underlineR$easoning vi$underlinea$ $underlineP$lanning $textbf(RAP)$.
arXiv Detail & Related papers (2023-05-24T10:28:28Z) - Self-Evaluation Guided Beam Search for Reasoning [61.523627290397556]
We introduce a stepwise self-evaluation mechanism to guide and calibrate the reasoning process of Large Language Model (LLM)
We propose a decoding algorithm integrating the self-evaluation guidance via beam search.
Our approach surpasses the corresponding Codex-backboned baselines in few-shot accuracy by $6.34%$, $9.56%$, and $5.46%$ on the GSM8K, AQuA, and StrategyQA.
arXiv Detail & Related papers (2023-05-01T02:37:59Z) - ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational
Finance Question Answering [70.6359636116848]
We propose a new large-scale dataset, ConvFinQA, to study the chain of numerical reasoning in conversational question answering.
Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations.
arXiv Detail & Related papers (2022-10-07T23:48:50Z) - A Few Expert Queries Suffices for Sample-Efficient RL with Resets and
Linear Value Approximation [16.29514743112387]
We study sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearlyrealizable.
We present a statistically and computationally efficient algorithm (Delphi) for blending exploration with expert queries.
Delphi requires $tildemathcalO(d)$ expert queries and a $textttpoly(d,|mathcalA|,1/varepsilon)$ amount of exploratory samples to provably recover an $varepsilon$suboptimal policy.
arXiv Detail & Related papers (2022-07-18T01:39:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.