Related papers: LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers

LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers

URL: http://arxiv.org/abs/2310.15164v2
Date: Wed, 14 Feb 2024 18:56:03 GMT
Title: LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers
Authors: Theo X. Olausson and Alex Gu and Benjamin Lipkin and Cedegao E. Zhang and Armando Solar-Lezama and Joshua B. Tenenbaum and Roger Levy
Abstract summary: Logical reasoning is an important task for artificial intelligence with potential impacts on science, mathematics, and society. In this work, we reformulating such tasks as modular neurosymbolic programming, which we call LINC. We observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate.
Score: 60.009969929857704
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Logical reasoning, i.e., deductively inferring the truth value of a conclusion from a set of premises, is an important task for artificial intelligence with wide potential impacts on science, mathematics, and society. While many prompting-based strategies have been proposed to enable Large Language Models (LLMs) to do such reasoning more effectively, they still appear unsatisfactory, often failing in subtle and unpredictable ways. In this work, we investigate the validity of instead reformulating such tasks as modular neurosymbolic programming, which we call LINC: Logical Inference via Neurosymbolic Computation. In LINC, the LLM acts as a semantic parser, translating premises and conclusions from natural language to expressions in first-order logic. These expressions are then offloaded to an external theorem prover, which symbolically performs deductive inference. Leveraging this approach, we observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate. On ProofWriter, augmenting the comparatively small open-source StarCoder+ (15.5B parameters) with LINC even outperforms GPT-3.5 and GPT-4 with Chain-of-Thought (CoT) prompting by an absolute 38% and 10%, respectively. When used with GPT-4, LINC scores 26% higher than CoT on ProofWriter while performing comparatively on FOLIO. Further analysis reveals that although both methods on average succeed roughly equally often on this dataset, they exhibit distinct and complementary failure modes. We thus provide promising evidence for how logical reasoning over natural language can be tackled through jointly leveraging LLMs alongside symbolic provers. All corresponding code is publicly available at https://github.com/benlipkin/linc

Related papers

Causal Prompting for Implicit Sentiment Analysis with Large Language Models [21.39152516811571]
Implicit Sentiment Analysis (ISA) aims to infer sentiment that is implied rather than explicitly stated.<n>Recent prompting-based methods using Large Language Models (LLMs) have shown promise in ISA.<n>We propose CAPITAL, a causal prompting framework that incorporates front-door adjustment into CoT reasoning.
arXiv Detail & Related papers (2025-07-01T03:01:09Z)
Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models [2.172419551358714]
Large language models (LLMs) have shown strong performance across natural language reasoning tasks, yet their reasoning processes remain brittle and difficult to interpret.<n>We introduce Theorem-of-Thought (ToTh), a novel framework that models reasoning as collaboration among three parallel agents.<n> Experiments on symbolic (WebOfLies) and numerical (MultiArithm) reasoning benchmarks show that ToTh consistently outperforms CoT, Self-Consistency, and CoT-Decoding.
arXiv Detail & Related papers (2025-06-08T12:28:38Z)
Learning to Reason via Mixture-of-Thought for Logical Reasoning [56.24256916896427]
Mixture-of-Thought (MoT) is a framework that enables LLMs to reason across three complementary modalities: natural language, code, and truth-table.<n>MoT adopts a two-phase design: (1) self-evolving MoT training, which jointly learns from filtered, self-generated rationales across modalities; and (2) MoT inference, which fully leverages the synergy of three modalities to produce better predictions.
arXiv Detail & Related papers (2025-05-21T17:59:54Z)
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [60.04718679054704]
We introduce Sketch-of-Thought (SoT), a novel prompting framework. It combines cognitive-inspired reasoning paradigms with linguistic constraints to minimize token usage. SoT achieves token reductions of 76% with negligible accuracy impact.
arXiv Detail & Related papers (2025-03-07T06:57:17Z)
P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains [97.25943550933829]
We present P-FOLIO, a human-annotated dataset consisting of diverse and complex reasoning chains. We use P-FOLIO to evaluate and improve large-language-model (LLM) reasoning capabilities.
arXiv Detail & Related papers (2024-10-11T19:22:57Z)
HYBRIDMIND: Meta Selection of Natural Language and Symbolic Language for Enhanced LLM Reasoning [25.192089674713365]
We introduce HYBRIDMIND, an adaptive strategy that selects the optimal reasoning approach for each reasoning problem. We find that fine-tuning LLaMA-3.1-8B-Instruct as a meta-selector outperforms GPT-4o's natural language reasoning.
arXiv Detail & Related papers (2024-09-28T15:12:55Z)
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models [10.106408289179463]
We propose Logic-of-Thought (LoT) prompting which employs propositional logic to generate expanded logical information from input context. LoT boosts the performance of various prompting methods with a striking margin across five logical reasoning tasks.
arXiv Detail & Related papers (2024-09-26T04:59:45Z)
LOGIC-LM++: Multi-Step Refinement for Symbolic Formulations [1.024113475677323]
This paper proposes Logic-LM++, an improvement on Logic-LM. It uses the ability of LLMs to do pairwise comparisons, allowing the evaluation of the refinements suggested by the LLM.
arXiv Detail & Related papers (2024-06-22T12:50:41Z)
LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks. But, can they really "reason" over the natural language? This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z)
NL2FOL: Translating Natural Language to First-Order Logic for Logical Fallacy Detection [45.28949266878263]
We design a process to reliably detect logical fallacies by translating natural language to First-order Logic. We then utilize Satisfiability Modulo Theory (SMT) solvers to reason about the validity of the formula. Our approach is robust, interpretable and does not require training data or fine-tuning.
arXiv Detail & Related papers (2024-04-18T00:20:48Z)
LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs) Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models. We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z)
Are LLMs Rigorous Logical Reasoner? Empowering Natural Language Proof Generation with Contrastive Stepwise Decoding [11.385103498440932]
We introduce contrastive decoding to stepwise proof generation, making use of negative reasoning paths to strengthen the model's capacity for logical deduction. Experiments on EntailmentBank underscore the success of our method in augmenting the proof planning abilities of language models.
arXiv Detail & Related papers (2023-11-12T05:12:49Z)
Language Models can be Logical Solvers [99.40649402395725]
We introduce LoGiPT, a novel language model that directly emulates the reasoning processes of logical solvers. LoGiPT is fine-tuned on a newly constructed instruction-tuning dataset derived from revealing and refining the invisible reasoning process of deductive solvers.
arXiv Detail & Related papers (2023-11-10T16:23:50Z)
BOOST: Harnessing Black-Box Control to Boost Commonsense in LMs' Generation [60.77990074569754]
We present a computation-efficient framework that steers a frozen Pre-Trained Language Model towards more commonsensical generation. Specifically, we first construct a reference-free evaluator that assigns a sentence with a commonsensical score. We then use the scorer as the oracle for commonsense knowledge, and extend the controllable generation method called NADO to train an auxiliary head.
arXiv Detail & Related papers (2023-10-25T23:32:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.