LINC: A Neurosymbolic Approach for Logical Reasoning by Combining
Language Models with First-Order Logic Provers
- URL: http://arxiv.org/abs/2310.15164v2
- Date: Wed, 14 Feb 2024 18:56:03 GMT
- Title: LINC: A Neurosymbolic Approach for Logical Reasoning by Combining
Language Models with First-Order Logic Provers
- Authors: Theo X. Olausson and Alex Gu and Benjamin Lipkin and Cedegao E. Zhang
and Armando Solar-Lezama and Joshua B. Tenenbaum and Roger Levy
- Abstract summary: Logical reasoning is an important task for artificial intelligence with potential impacts on science, mathematics, and society.
In this work, we reformulating such tasks as modular neurosymbolic programming, which we call LINC.
We observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate.
- Score: 60.009969929857704
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Logical reasoning, i.e., deductively inferring the truth value of a
conclusion from a set of premises, is an important task for artificial
intelligence with wide potential impacts on science, mathematics, and society.
While many prompting-based strategies have been proposed to enable Large
Language Models (LLMs) to do such reasoning more effectively, they still appear
unsatisfactory, often failing in subtle and unpredictable ways. In this work,
we investigate the validity of instead reformulating such tasks as modular
neurosymbolic programming, which we call LINC: Logical Inference via
Neurosymbolic Computation. In LINC, the LLM acts as a semantic parser,
translating premises and conclusions from natural language to expressions in
first-order logic. These expressions are then offloaded to an external theorem
prover, which symbolically performs deductive inference. Leveraging this
approach, we observe significant performance gains on FOLIO and a balanced
subset of ProofWriter for three different models in nearly all experimental
conditions we evaluate. On ProofWriter, augmenting the comparatively small
open-source StarCoder+ (15.5B parameters) with LINC even outperforms GPT-3.5
and GPT-4 with Chain-of-Thought (CoT) prompting by an absolute 38% and 10%,
respectively. When used with GPT-4, LINC scores 26% higher than CoT on
ProofWriter while performing comparatively on FOLIO. Further analysis reveals
that although both methods on average succeed roughly equally often on this
dataset, they exhibit distinct and complementary failure modes. We thus provide
promising evidence for how logical reasoning over natural language can be
tackled through jointly leveraging LLMs alongside symbolic provers. All
corresponding code is publicly available at https://github.com/benlipkin/linc
Related papers
- LOGIC-LM++: Multi-Step Refinement for Symbolic Formulations [1.024113475677323]
This paper proposes Logic-LM++, an improvement on Logic-LM.
It uses the ability of LLMs to do pairwise comparisons, allowing the evaluation of the refinements suggested by the LLM.
arXiv Detail & Related papers (2024-06-22T12:50:41Z) - LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks.
But, can they really "reason" over the natural language?
This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z) - NL2FOL: Translating Natural Language to First-Order Logic for Logical Fallacy Detection [45.28949266878263]
We design a process to reliably detect logical fallacies by translating natural language to First-order Logic.
We then utilize Satisfiability Modulo Theory (SMT) solvers to reason about the validity of the formula.
Our approach is robust, interpretable and does not require training data or fine-tuning.
arXiv Detail & Related papers (2024-04-18T00:20:48Z) - A & B == B & A: Triggering Logical Reasoning Failures in Large Language
Models [65.86149763739141]
We introduce LogicAsker, an automatic approach that comprehensively evaluates and improves the logical reasoning abilities of LLMs.
We evaluate LogicAsker on six widely deployed LLMs, including GPT-3, ChatGPT, GPT-4, Bard, Vicuna, and Guanaco.
The results show that test cases from LogicAsker can find logical reasoning failures in different LLMs with a rate of 25% - 94%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z) - Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs [102.00359477458029]
We present a neuro-symbolic integration method, in which a neural LLM is used to represent the knowledge of the problem.
An LLM-free symbolic solver is adopted to do deliberative reasoning using the knowledge.
arXiv Detail & Related papers (2023-11-16T11:26:21Z) - Are LLMs Rigorous Logical Reasoner? Empowering Natural Language Proof
Generation with Contrastive Stepwise Decoding [11.385103498440932]
We introduce contrastive decoding to stepwise proof generation, making use of negative reasoning paths to strengthen the model's capacity for logical deduction.
Experiments on EntailmentBank underscore the success of our method in augmenting the proof planning abilities of language models.
arXiv Detail & Related papers (2023-11-12T05:12:49Z) - Language Models can be Logical Solvers [99.40649402395725]
We introduce LoGiPT, a novel language model that directly emulates the reasoning processes of logical solvers.
LoGiPT is fine-tuned on a newly constructed instruction-tuning dataset derived from revealing and refining the invisible reasoning process of deductive solvers.
arXiv Detail & Related papers (2023-11-10T16:23:50Z) - BOOST: Harnessing Black-Box Control to Boost Commonsense in LMs'
Generation [60.77990074569754]
We present a computation-efficient framework that steers a frozen Pre-Trained Language Model towards more commonsensical generation.
Specifically, we first construct a reference-free evaluator that assigns a sentence with a commonsensical score.
We then use the scorer as the oracle for commonsense knowledge, and extend the controllable generation method called NADO to train an auxiliary head.
arXiv Detail & Related papers (2023-10-25T23:32:12Z) - Are Large Language Models Really Good Logical Reasoners? A Comprehensive
Evaluation and Beyond [32.797832207443896]
Large Language Models (LLMs) have emerged as a noteworthy innovation in natural language processing (NLP)
We aim to bridge this gap and provide comprehensive evaluations in this paper.
arXiv Detail & Related papers (2023-06-16T13:39:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.