Related papers: Can Pretrained Language Models (Yet) Reason Deductively?

Can Pretrained Language Models (Yet) Reason Deductively?

URL: http://arxiv.org/abs/2210.06442v1
Date: Wed, 12 Oct 2022 17:44:15 GMT
Title: Can Pretrained Language Models (Yet) Reason Deductively?
Authors: Zhangdie Yuan, Songbo Hu, Ivan Vuli\'c, Anna Korhonen and Zaiqiao Meng
Abstract summary: We conduct a comprehensive evaluation of the learnable deductive (also known as explicit) reasoning capability of PLMs. Our main results suggest that PLMs cannot yet perform reliable deductive reasoning. We reach beyond (misleading) task performance, revealing that PLMs are still far from human-level reasoning capabilities.
Score: 72.9103833294272
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Acquiring factual knowledge with Pretrained Language Models (PLMs) has attracted increasing attention, showing promising performance in many knowledge-intensive tasks. Their good performance has led the community to believe that the models do possess a modicum of reasoning competence rather than merely memorising the knowledge. In this paper, we conduct a comprehensive evaluation of the learnable deductive (also known as explicit) reasoning capability of PLMs. Through a series of controlled experiments, we posit two main findings. (i) PLMs inadequately generalise learned logic rules and perform inconsistently against simple adversarial surface form edits. (ii) While the deductive reasoning fine-tuning of PLMs does improve their performance on reasoning over unseen knowledge facts, it results in catastrophically forgetting the previously learnt knowledge. Our main results suggest that PLMs cannot yet perform reliable deductive reasoning, demonstrating the importance of controlled examinations and probing of PLMs' reasoning abilities; we reach beyond (misleading) task performance, revealing that PLMs are still far from human-level reasoning capabilities, even for simple deductive tasks.

Related papers

LINGOLY-TOO: Disentangling Reasoning from Knowledge with Templatised Orthographic Obfuscation [1.2576388595811496]
We introduce LINGOLY-TOO, a challenging reasoning benchmark grounded in natural language.<n>We permute reasoning problems written in real languages to generate numerous question variations.<n>Experiments and analyses show that models can circumvent reasoning and answer from prior knowledge.
arXiv Detail & Related papers (2025-03-04T19:57:47Z)
InductionBench: LLMs Fail in the Simplest Complexity Class [53.70978746199222]
Large language models (LLMs) have shown remarkable improvements in reasoning. Inductive reasoning, where one infers the underlying rules from observed data, remains less explored. We introduce InductionBench, a new benchmark designed to evaluate the inductive reasoning ability of LLMs.
arXiv Detail & Related papers (2025-02-20T03:48:00Z)
MIRAGE: Evaluating and Explaining Inductive Reasoning Process in Language Models [19.81485079689837]
We evaluate large language models' capabilities in inductive and deductive stages. We find that the models tend to consistently conduct correct deduction without correct inductive rules. In the inductive reasoning process, the model tends to focus on observed facts that are close to the current test example in feature space.
arXiv Detail & Related papers (2024-10-12T14:12:36Z)
Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs [99.76347807139615]
Reasoning encompasses two typical types: deductive reasoning and inductive reasoning. Despite extensive research into the reasoning capabilities of Large Language Models (LLMs), most studies have failed to rigorously differentiate between inductive and deductive reasoning. This raises an essential question: In LLM reasoning, which poses a greater challenge - deductive or inductive reasoning?
arXiv Detail & Related papers (2024-07-31T18:47:11Z)
Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning [77.72128397088409]
We show that most prevalent MLLMs can be easily fooled by the introduction of a presupposition into the question.<n>We also propose a novel reinforcement learning paradigm to encourage the model to actively perform composite deduction.
arXiv Detail & Related papers (2024-04-19T15:53:27Z)
From Heuristic to Analytic: Cognitively Motivated Strategies for Coherent Physical Commonsense Reasoning [66.98861219674039]
Heuristic-Analytic Reasoning (HAR) strategies drastically improve the coherence of rationalizations for model decisions. Our findings suggest that human-like reasoning strategies can effectively improve the coherence and reliability of PLM reasoning.
arXiv Detail & Related papers (2023-10-24T19:46:04Z)
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement [92.61557711360652]
Language models (LMs) often fall short on inductive reasoning, despite achieving impressive success on research benchmarks. We conduct a systematic study of the inductive reasoning capabilities of LMs through iterative hypothesis refinement. We reveal several discrepancies between the inductive reasoning processes of LMs and humans, shedding light on both the potentials and limitations of using LMs in inductive reasoning tasks.
arXiv Detail & Related papers (2023-10-12T17:51:10Z)
Concise and Organized Perception Facilitates Reasoning in Large Language Models [32.71672086718057]
We show that large language models (LLMs) exhibit failure patterns akin to human-like cognitive biases when dealing with disordered and irrelevant content in reasoning tasks. We propose a novel reasoning approach named Concise and Organized Perception (COP) COP carefully analyzes the given statements to identify the most pertinent information while eliminating redundancy efficiently.
arXiv Detail & Related papers (2023-10-05T04:47:49Z)
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems. LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning. We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z)
Evaluating the Deductive Competence of Large Language Models [0.2218292673050528]
We investigate whether several large language models (LLMs) can solve a classic type of deductive reasoning problem. We do find performance differences between conditions; however, they do not improve overall performance. We find that performance interacts with presentation format and content in unexpected ways that differ from human performance.
arXiv Detail & Related papers (2023-09-11T13:47:07Z)
Knowledge Rumination for Pre-trained Language Models [77.55888291165462]
We propose a new paradigm dubbed Knowledge Rumination to help the pre-trained language model utilize related latent knowledge without retrieving it from the external corpus. We apply the proposed knowledge rumination to various language models, including RoBERTa, DeBERTa, and GPT-3.
arXiv Detail & Related papers (2023-05-15T15:47:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.