Related papers: FLARE: Faithful Logic-Aided Reasoning and Exploration

FLARE: Faithful Logic-Aided Reasoning and Exploration

URL: http://arxiv.org/abs/2410.11900v2
Date: Sat, 19 Oct 2024 00:05:17 GMT
Title: FLARE: Faithful Logic-Aided Reasoning and Exploration
Authors: Erik Arakelyan, Pasquale Minervini, Pat Verga, Patrick Lewis, Isabelle Augenstein,
Abstract summary: We introduce a novel approach for traversing the problem space using task decompositions. We use the Large Language Models to plan a solution, soft-formalise the query into facts and predicates using a logic programming code. Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers.
Score: 50.9814063216852
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern Question Answering (QA) and Reasoning approaches based on Large Language Models (LLMs) commonly use prompting techniques, such as Chain-of-Thought (CoT), assuming the resulting generation will have a more granular exploration and reasoning over the question space and scope. However, such methods struggle with generating outputs that are faithful to the intermediate chain of reasoning produced by the model. On the other end of the spectrum, neuro-symbolic methods such as Faithful CoT (F-CoT) propose to combine LLMs with external symbolic solvers. While such approaches boast a high degree of faithfulness, they usually require a model trained for code generation and struggle with tasks that are ambiguous or hard to formalise strictly. We introduce $\textbf{F}$aithful $\textbf{L}$ogic-$\textbf{A}$ided $\textbf{R}$easoning and $\textbf{E}$xploration ($\textbf{FLARE}$), a novel interpretable approach for traversing the problem space using task decompositions. We use the LLM to plan a solution, soft-formalise the query into facts and predicates using a logic programming code and simulate that code execution using an exhaustive multi-hop search over the defined space. Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers. Our methods achieve SOTA results on $\mathbf{7}$ out of $\mathbf{9}$ diverse reasoning benchmarks. We also show that model faithfulness positively correlates with overall performance and further demonstrate that $\textbf{FLARE}$ allows pinpointing the decisive factors sufficient for and leading to the correct answer with optimal reasoning during the multi-hop search.

Related papers

Reinforced Latent Reasoning for LLM-based Recommendation [83.18146814163308]
Large Language Models (LLMs) have demonstrated impressive reasoning capabilities in complex problem-solving tasks.<n>Existing methods typically rely on fine-tuning with explicit chain-of-thought (CoT) data.<n>In this work, we explore an alternative approach that shifts from explicit CoT reasoning to compact, information-dense latent reasoning.
arXiv Detail & Related papers (2025-05-25T11:03:45Z)
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought [56.71873693264532]
We prove that a two-layer transformer with $D$ steps of continuous CoTs can solve the directed graph reachability problem.<n>In our construction, each continuous thought vector is a superposition state that encodes multiple search frontiers simultaneously.
arXiv Detail & Related papers (2025-05-18T18:36:53Z)
Self-Training Elicits Concise Reasoning in Large Language Models [23.475414693530965]
Chain-of-thought (CoT) reasoning has enabled large language models (LLMs) to utilize additional computation through intermediate tokens. We propose simple fine-tuning methods which leverage self-generated concise reasoning paths. Our method achieves a 30% reduction in output tokens, across five model families on GSM8K and MATH, while maintaining average accuracy.
arXiv Detail & Related papers (2025-02-27T14:14:50Z)
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls [83.89771461061903]
Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs) Recent advancements in tree search algorithms guided by verifiers have significantly enhanced the reasoning capabilities of large language models (LLMs) We identify two key challenges contributing to this inefficiency: $textitover-exploration$ due to redundant states with semantically equivalent content, and $textitunder-exploration$ caused by high variance in verifier scoring. We propose FETCH, a flexible, plug-and-play system compatible with various tree search algorithms.
arXiv Detail & Related papers (2025-02-16T16:12:01Z)
Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus [13.276829763453433]
Large language models (LLMs) are capable of solving a wide range of tasks, yet they have struggled with reasoning. We propose $textbfAdditional Logic Training (ALT)$, which aims to enhance LLMs' reasoning capabilities by program-generated logical reasoning samples.
arXiv Detail & Related papers (2024-11-19T13:31:53Z)
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning [55.52872152909785]
Chain-of-thought (CoT) via prompting is the de facto method for eliciting reasoning capabilities from large language models (LLMs) We show that CoT gives strong performance benefits primarily on tasks involving math or logic, with much smaller gains on other types of tasks.
arXiv Detail & Related papers (2024-09-18T17:55:00Z)
Great Memory, Shallow Reasoning: Limits of $k$NN-LMs [71.73611113995143]
$k$NN-LMs, which integrate retrieval with next-word prediction, have demonstrated strong performance in language modeling. We ask whether this improved ability to recall information really translates into downstream abilities.
arXiv Detail & Related papers (2024-08-21T17:59:05Z)
FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering [46.41364317172677]
Large Language Models (LLMs) are often challenged by generating erroneous or hallucinated responses.<n>We propose a unified framework, FiDeLiS, designed to improve the factuality of LLM responses by anchoring answers to verifiable reasoning steps retrieved from Knowledge Graphs.<n>Our method, as a training-free framework, not only improve the performance but also enhance the factuality and interpretability across different benchmarks.
arXiv Detail & Related papers (2024-05-22T17:56:53Z)
Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning [15.46907000938726]
We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL) We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC. We evaluate our proposed method on multiple parallel RL environments, including a deep exploration problem (textiti.e., $N$-chain), a video game, and a real-world problem in energy systems.
arXiv Detail & Related papers (2024-04-16T17:01:38Z)
Can Large Language Models Play Games? A Case Study of A Self-Play Approach [61.15761840203145]
Large Language Models (LLMs) harness extensive data from the Internet, storing a broad spectrum of prior knowledge. Monte-Carlo Tree Search (MCTS) is a search algorithm that provides reliable decision-making solutions. This work introduces an innovative approach that bolsters LLMs with MCTS self-play to efficiently resolve turn-based zero-sum games.
arXiv Detail & Related papers (2024-03-08T19:16:29Z)
Reasoning with Language Model is Planning with World Model [27.24144881796878]
Large language models (LLMs) have shown remarkable reasoning capabilities. LLMs lack an internal $textitworld model$ to predict the world. We propose a new LLM reasoning framework, $underlineR$easoning vi$underlinea$ $underlineP$lanning $textbf(RAP)$.
arXiv Detail & Related papers (2023-05-24T10:28:28Z)
Self-Evaluation Guided Beam Search for Reasoning [61.523627290397556]
We introduce a stepwise self-evaluation mechanism to guide and calibrate the reasoning process of Large Language Model (LLM) We propose a decoding algorithm integrating the self-evaluation guidance via beam search. Our approach surpasses the corresponding Codex-backboned baselines in few-shot accuracy by $6.34%$, $9.56%$, and $5.46%$ on the GSM8K, AQuA, and StrategyQA.
arXiv Detail & Related papers (2023-05-01T02:37:59Z)
ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering [70.6359636116848]
We propose a new large-scale dataset, ConvFinQA, to study the chain of numerical reasoning in conversational question answering. Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations.
arXiv Detail & Related papers (2022-10-07T23:48:50Z)
A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation [16.29514743112387]
We study sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearlyrealizable. We present a statistically and computationally efficient algorithm (Delphi) for blending exploration with expert queries. Delphi requires $tildemathcalO(d)$ expert queries and a $textttpoly(d,|mathcalA|,1/varepsilon)$ amount of exploratory samples to provably recover an $varepsilon$suboptimal policy.
arXiv Detail & Related papers (2022-07-18T01:39:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.