Related papers: MIRAGE: Evaluating and Explaining Inductive Reasoning Process in Language Models

MIRAGE: Evaluating and Explaining Inductive Reasoning Process in Language Models

URL: http://arxiv.org/abs/2410.09542v1
Date: Sat, 12 Oct 2024 14:12:36 GMT
Title: MIRAGE: Evaluating and Explaining Inductive Reasoning Process in Language Models
Authors: Jiachun Li, Pengfei Cao, Zhuoran Jin, Yubo Chen, Kang Liu, Jun Zhao,
Abstract summary: We evaluate large language models' capabilities in inductive and deductive stages. We find that the models tend to consistently conduct correct deduction without correct inductive rules. In the inductive reasoning process, the model tends to focus on observed facts that are close to the current test example in feature space.
Score: 19.81485079689837
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Inductive reasoning is an essential capability for large language models (LLMs) to achieve higher intelligence, which requires the model to generalize rules from observed facts and then apply them to unseen examples. We present {\scshape Mirage}, a synthetic dataset that addresses the limitations of previous work, specifically the lack of comprehensive evaluation and flexible test data. In it, we evaluate LLMs' capabilities in both the inductive and deductive stages, allowing for flexible variation in input distribution, task scenario, and task difficulty to analyze the factors influencing LLMs' inductive reasoning. Based on these multi-faceted evaluations, we demonstrate that the LLM is a poor rule-based reasoner. In many cases, when conducting inductive reasoning, they do not rely on a correct rule to answer the unseen case. From the perspectives of different prompting methods, observation numbers, and task forms, models tend to consistently conduct correct deduction without correct inductive rules. Besides, we find that LLMs are good neighbor-based reasoners. In the inductive reasoning process, the model tends to focus on observed facts that are close to the current test example in feature space. By leveraging these similar examples, the model maintains strong inductive capabilities within a localized region, significantly improving its deductive performance.

Related papers

A Survey of Inductive Reasoning for Large Language Models [55.23215679173251]
The inductive mode is crucial for knowledge generalization and aligns better with human cognition.<n>Despite the importance of inductive reasoning, there is no systematic summary of it.<n>This paper presents the first comprehensive survey of inductive reasoning for large language models.
arXiv Detail & Related papers (2025-10-11T11:45:38Z)
On LLM-Based Scientific Inductive Reasoning Beyond Equations [51.61971971921903]
We propose the task of LLM-Based Scientific Inductive Reasoning Beyond Equations.<n>We introduce a new benchmark, SIRBench-V1, to evaluate the inductive reasoning abilities of LLMs in scientific settings.
arXiv Detail & Related papers (2025-09-12T10:11:52Z)
Patterns Over Principles: The Fragility of Inductive Reasoning in LLMs under Noisy Observations [43.491353243991284]
We introduce Robust Rule Induction, a task that evaluates large language models' capability in inferring rules from data fused with noisy examples. We also propose Sample-steered Rule Refinement (SRR), a method enhancing reasoning stability via observation diversification and execution-guided feedback. Our findings challenge LLMs' reasoning, revealing susceptibility to hypothesis drift and pattern overfitting, while providing empirical evidence critical for developing human-like inductive systems.
arXiv Detail & Related papers (2025-02-22T10:03:19Z)
InductionBench: LLMs Fail in the Simplest Complexity Class [53.70978746199222]
Large language models (LLMs) have shown remarkable improvements in reasoning. Inductive reasoning, where one infers the underlying rules from observed data, remains less explored. We introduce InductionBench, a new benchmark designed to evaluate the inductive reasoning ability of LLMs.
arXiv Detail & Related papers (2025-02-20T03:48:00Z)
JustLogic: A Comprehensive Benchmark for Evaluating Deductive Reasoning in Large Language Models [51.99046112135311]
We introduce JustLogic, a synthetically generated deductive reasoning benchmark for rigorous evaluation of Large Language Models. JustLogic is highly complex, capable of generating a diverse range of linguistic patterns, vocabulary, and argument structures. Our experimental results reveal that most state-of-the-art (SOTA) LLMs perform significantly worse than the human average.
arXiv Detail & Related papers (2025-01-24T15:49:10Z)
Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs [99.76347807139615]
Reasoning encompasses two typical types: deductive reasoning and inductive reasoning. Despite extensive research into the reasoning capabilities of Large Language Models (LLMs), most studies have failed to rigorously differentiate between inductive and deductive reasoning. This raises an essential question: In LLM reasoning, which poses a greater challenge - deductive or inductive reasoning?
arXiv Detail & Related papers (2024-07-31T18:47:11Z)
Case2Code: Learning Inductive Reasoning with Synthetic Data [105.89741089673575]
We propose a textbfCase2Code task by exploiting the expressiveness and correctness of programs. We first evaluate representative LLMs on the synthesized Case2Code task and demonstrate that the Case-to-code induction is challenging for LLMs. Experimental results show that such induction training benefits not only in distribution Case2Code performance but also enhances various coding abilities of trained LLMs.
arXiv Detail & Related papers (2024-07-17T11:35:00Z)
Evaluating Consistency and Reasoning Capabilities of Large Language Models [0.0]
Large Language Models (LLMs) are extensively used today across various sectors, including academia, research, business, and finance. Despite their widespread adoption, these models often produce incorrect and misleading information, exhibiting a tendency to hallucinate. This paper aims to evaluate and compare the consistency and reasoning capabilities of both public and proprietary LLMs.
arXiv Detail & Related papers (2024-04-25T10:03:14Z)
Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning [25.732397636695882]
We show that large language models (LLMs) display reasoning patterns akin to those observed in humans. Our research demonstrates that the architecture and scale of the model significantly affect its preferred method of reasoning.
arXiv Detail & Related papers (2024-02-20T12:58:14Z)
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement [92.61557711360652]
Language models (LMs) often fall short on inductive reasoning, despite achieving impressive success on research benchmarks. We conduct a systematic study of the inductive reasoning capabilities of LMs through iterative hypothesis refinement. We reveal several discrepancies between the inductive reasoning processes of LMs and humans, shedding light on both the potentials and limitations of using LMs in inductive reasoning tasks.
arXiv Detail & Related papers (2023-10-12T17:51:10Z)
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems. LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning. We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z)
Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning. In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training. We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z)
Can Pretrained Language Models (Yet) Reason Deductively? [72.9103833294272]
We conduct a comprehensive evaluation of the learnable deductive (also known as explicit) reasoning capability of PLMs. Our main results suggest that PLMs cannot yet perform reliable deductive reasoning. We reach beyond (misleading) task performance, revealing that PLMs are still far from human-level reasoning capabilities.
arXiv Detail & Related papers (2022-10-12T17:44:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.