Deductive Additivity for Planning of Natural Language Proofs
- URL: http://arxiv.org/abs/2307.02472v2
- Date: Thu, 6 Jul 2023 02:16:33 GMT
- Title: Deductive Additivity for Planning of Natural Language Proofs
- Authors: Zayne Sprague, Kaj Bostrom, Swarat Chaudhuri, Greg Durrett
- Abstract summary: We investigate whether an efficient planning is possible via embedding spaces compatible with deductive reasoning.
Our findings suggest that while standard embedding methods frequently embed conclusions near the sums of their premises, they fall short of being effectives and lack the ability to model certain categories of reasoning.
- Score: 43.93269297653265
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current natural language systems designed for multi-step claim validation
typically operate in two phases: retrieve a set of relevant premise statements
using heuristics (planning), then generate novel conclusions from those
statements using a large language model (deduction). The planning step often
requires expensive Transformer operations and does not scale to arbitrary
numbers of premise statements. In this paper, we investigate whether an
efficient planning heuristic is possible via embedding spaces compatible with
deductive reasoning. Specifically, we evaluate whether embedding spaces exhibit
a property we call deductive additivity: the sum of premise statement
embeddings should be close to embeddings of conclusions based on those
premises. We explore multiple sources of off-the-shelf dense embeddings in
addition to fine-tuned embeddings from GPT3 and sparse embeddings from BM25. We
study embedding models both intrinsically, evaluating whether the property of
deductive additivity holds, and extrinsically, using them to assist planning in
natural language proof generation. Lastly, we create a dataset, Single-Step
Reasoning Contrast (SSRC), to further probe performance on various reasoning
types. Our findings suggest that while standard embedding methods frequently
embed conclusions near the sums of their premises, they fall short of being
effective heuristics and lack the ability to model certain categories of
reasoning.
Related papers
- TabVer: Tabular Fact Verification with Natural Logic [11.002475880349452]
We propose a set-theoretic interpretation of numerals and arithmetic functions in the context of natural logic.
We leverage large language models to generate arithmetic expressions by generating questions about salient parts of a claim which are answered by executing functions on tables.
In a few-shot setting on FEVEROUS, we achieve an accuracy of 71.4, outperforming both fully neural and symbolic reasoning models by 3.4 points.
arXiv Detail & Related papers (2024-11-02T00:36:34Z) - QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning Scenarios [15.193544498311603]
We present QUITE, a dataset of real-world Bayesian reasoning scenarios with categorical random variables and complex relationships.
We conduct an extensive set of experiments, finding that logic-based models outperform out-of-the-box large language models on all reasoning types.
Our results provide evidence that neuro-symbolic models are a promising direction for improving complex reasoning.
arXiv Detail & Related papers (2024-10-14T12:44:59Z) - Log Probabilities Are a Reliable Estimate of Semantic Plausibility in Base and Instruction-Tuned Language Models [50.15455336684986]
We evaluate the effectiveness of LogProbs and basic prompting to measure semantic plausibility.
We find that LogProbs offers a more reliable measure of semantic plausibility than direct zero-shot prompting.
We conclude that, even in the era of prompt-based evaluations, LogProbs constitute a useful metric of semantic plausibility.
arXiv Detail & Related papers (2024-03-21T22:08:44Z) - CASA: Causality-driven Argument Sufficiency Assessment [79.13496878681309]
We propose CASA, a zero-shot causality-driven argument sufficiency assessment framework.
PS measures how likely introducing the premise event would lead to the conclusion when both the premise and conclusion events are absent.
Experiments on two logical fallacy detection datasets demonstrate that CASA accurately identifies insufficient arguments.
arXiv Detail & Related papers (2024-01-10T16:21:18Z) - Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement [92.61557711360652]
Language models (LMs) often fall short on inductive reasoning, despite achieving impressive success on research benchmarks.
We conduct a systematic study of the inductive reasoning capabilities of LMs through iterative hypothesis refinement.
We reveal several discrepancies between the inductive reasoning processes of LMs and humans, shedding light on both the potentials and limitations of using LMs in inductive reasoning tasks.
arXiv Detail & Related papers (2023-10-12T17:51:10Z) - A Semantic Approach to Decidability in Epistemic Planning (Extended
Version) [72.77805489645604]
We use a novel semantic approach to achieve decidability.
Specifically, we augment the logic of knowledge S5$_n$ and with an interaction axiom called (knowledge) commutativity.
We prove that our framework admits a finitary non-fixpoint characterization of common knowledge, which is of independent interest.
arXiv Detail & Related papers (2023-07-28T11:26:26Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Natural Language Deduction with Incomplete Information [43.93269297653265]
We propose a new system that can handle the underspecified setting where not all premises are stated at the outset.
By using a natural language generation model to abductively infer a premise given another premise and a conclusion, we can impute missing pieces of evidence needed for the conclusion to be true.
arXiv Detail & Related papers (2022-11-01T17:27:55Z) - Natural Language Deduction through Search over Statement Compositions [43.93269297653265]
We propose a system for natural language deduction that decomposes the task into separate steps coordinated by best-first search.
Our experiments demonstrate that the proposed system can better distinguish verifiable hypotheses from unverifiable ones.
arXiv Detail & Related papers (2022-01-16T12:05:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.