Hypothesis Testing Prompting Improves Deductive Reasoning in Large Language Models
- URL: http://arxiv.org/abs/2405.06707v1
- Date: Thu, 9 May 2024 08:46:17 GMT
- Title: Hypothesis Testing Prompting Improves Deductive Reasoning in Large Language Models
- Authors: Yitian Li, Jidong Tian, Hao He, Yaohui Jin,
- Abstract summary: textitHypothesis Testing Prompting adds conclusion assumptions, backward reasoning, and fact verification during intermediate reasoning steps.
Experiments show that hypothesis testing prompting not only significantly improves the effect, but also generates a more reasonable and standardized reasoning process.
- Score: 19.879616265315637
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Combining different forms of prompts with pre-trained large language models has yielded remarkable results on reasoning tasks (e.g. Chain-of-Thought prompting). However, along with testing on more complex reasoning, these methods also expose problems such as invalid reasoning and fictional reasoning paths. In this paper, we develop \textit{Hypothesis Testing Prompting}, which adds conclusion assumptions, backward reasoning, and fact verification during intermediate reasoning steps. \textit{Hypothesis Testing prompting} involves multiple assumptions and reverses validation of conclusions leading to its unique correct answer. Experiments on two challenging deductive reasoning datasets ProofWriter and RuleTaker show that hypothesis testing prompting not only significantly improves the effect, but also generates a more reasonable and standardized reasoning process.
Related papers
- QUITE: Quantifying Uncertainty in Natural Language Text in Bayesian Reasoning Scenarios [15.193544498311603]
We present QUITE, a dataset of real-world Bayesian reasoning scenarios with categorical random variables and complex relationships.
We conduct an extensive set of experiments, finding that logic-based models outperform out-of-the-box large language models on all reasoning types.
Our results provide evidence that neuro-symbolic models are a promising direction for improving complex reasoning.
arXiv Detail & Related papers (2024-10-14T12:44:59Z) - Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement [92.61557711360652]
Language models (LMs) often fall short on inductive reasoning, despite achieving impressive success on research benchmarks.
We conduct a systematic study of the inductive reasoning capabilities of LMs through iterative hypothesis refinement.
We reveal several discrepancies between the inductive reasoning processes of LMs and humans, shedding light on both the potentials and limitations of using LMs in inductive reasoning tasks.
arXiv Detail & Related papers (2023-10-12T17:51:10Z) - Deductive Verification of Chain-of-Thought Reasoning [22.79166959432764]
Large Language Models (LLMs) benefit from Chain-of-Thought prompting in performing various reasoning tasks.
While CoT allows models to produce more comprehensive reasoning processes, its emphasis on intermediate reasoning steps can inadvertently introduce hallucinations and accumulated errors.
We propose Natural Program, a natural language-based deductive reasoning format.
arXiv Detail & Related papers (2023-06-06T17:18:56Z) - Abductive Commonsense Reasoning Exploiting Mutually Exclusive
Explanations [118.0818807474809]
Abductive reasoning aims to find plausible explanations for an event.
Existing approaches for abductive reasoning in natural language processing often rely on manually generated annotations for supervision.
This work proposes an approach for abductive commonsense reasoning that exploits the fact that only a subset of explanations is correct for a given context.
arXiv Detail & Related papers (2023-05-24T01:35:10Z) - Explanation Selection Using Unlabeled Data for Chain-of-Thought
Prompting [80.9896041501715]
Explanations that have not been "tuned" for a task, such as off-the-shelf explanations written by nonexperts, may lead to mediocre performance.
This paper tackles the problem of how to optimize explanation-infused prompts in a blackbox fashion.
arXiv Detail & Related papers (2023-02-09T18:02:34Z) - MetaLogic: Logical Reasoning Explanations with Fine-Grained Structure [129.8481568648651]
We propose a benchmark to investigate models' logical reasoning capabilities in complex real-life scenarios.
Based on the multi-hop chain of reasoning, the explanation form includes three main components.
We evaluate the current best models' performance on this new explanation form.
arXiv Detail & Related papers (2022-10-22T16:01:13Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Maieutic Prompting: Logically Consistent Reasoning with Recursive
Explanations [71.2950434944196]
We develop Maieutic Prompting, which infers a correct answer to a question even from the noisy and inconsistent generations of language models.
Maieutic Prompting achieves up to 20% better accuracy than state-of-the-art prompting methods.
arXiv Detail & Related papers (2022-05-24T06:36:42Z) - Natural Language Deduction through Search over Statement Compositions [43.93269297653265]
We propose a system for natural language deduction that decomposes the task into separate steps coordinated by best-first search.
Our experiments demonstrate that the proposed system can better distinguish verifiable hypotheses from unverifiable ones.
arXiv Detail & Related papers (2022-01-16T12:05:48Z) - ReasonBERT: Pre-trained to Reason with Distant Supervision [17.962648165675684]
We present ReasonBert, a pre-training method that augments language models with the ability to reason over long-range relations and multiple, possibly hybrid contexts.
Different types of reasoning are simulated, including intersecting multiple pieces of evidence, bridging from one piece of evidence to another, and detecting unanswerable cases.
arXiv Detail & Related papers (2021-09-10T14:49:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.