Analysis of Error Sources in LLM-based Hypothesis Search for Few-Shot Rule Induction
- URL: http://arxiv.org/abs/2509.01016v1
- Date: Sun, 31 Aug 2025 22:42:58 GMT
- Title: Analysis of Error Sources in LLM-based Hypothesis Search for Few-Shot Rule Induction
- Authors: Aishni Parab, Hongjing Lu, Ying Nian Wu, Sumit Gulwani,
- Abstract summary: We compare an LLM-based hypothesis search framework with direct program generation approaches on few-shot rule induction tasks.<n>Our findings show that hypothesis search achieves performance comparable to humans, while direct program generation falls notably behind.
- Score: 39.93231455166502
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inductive reasoning enables humans to infer abstract rules from limited examples and apply them to novel situations. In this work, we compare an LLM-based hypothesis search framework with direct program generation approaches on few-shot rule induction tasks. Our findings show that hypothesis search achieves performance comparable to humans, while direct program generation falls notably behind. An error analysis reveals key bottlenecks in hypothesis generation and suggests directions for advancing program induction methods. Overall, this paper underscores the potential of LLM-based hypothesis search for modeling inductive reasoning and the challenges in building more efficient systems.
Related papers
- On LLM-Based Scientific Inductive Reasoning Beyond Equations [51.61971971921903]
We propose the task of LLM-Based Scientific Inductive Reasoning Beyond Equations.<n>We introduce a new benchmark, SIRBench-V1, to evaluate the inductive reasoning abilities of LLMs in scientific settings.
arXiv Detail & Related papers (2025-09-12T10:11:52Z) - MOOSE-Chem2: Exploring LLM Limits in Fine-Grained Scientific Hypothesis Discovery via Hierarchical Search [102.11776494401705]
Large language models (LLMs) have shown promise in automating scientific hypothesis generation.<n>Existing approaches primarily yield coarse-grained hypotheses lacking critical methodological and experimental details.<n>We introduce and formally define the new task of fine-grained scientific hypothesis discovery.
arXiv Detail & Related papers (2025-05-25T16:13:46Z) - InductionBench: LLMs Fail in the Simplest Complexity Class [53.70978746199222]
Large language models (LLMs) have shown remarkable improvements in reasoning.<n>Inductive reasoning, where one infers the underlying rules from observed data, remains less explored.<n>We introduce InductionBench, a new benchmark designed to evaluate the inductive reasoning ability of LLMs.
arXiv Detail & Related papers (2025-02-20T03:48:00Z) - Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models [76.6028674686018]
We introduce thought-tracing, an inference-time reasoning algorithm to trace the mental states of agents.<n>Our algorithm is modeled after the Bayesian theory-of-mind framework.<n>We evaluate thought-tracing on diverse theory-of-mind benchmarks, demonstrating significant performance improvements.
arXiv Detail & Related papers (2025-02-17T15:08:50Z) - On the Role of Model Prior in Real-World Inductive Reasoning [7.962140902232628]
In real-world applications, Large Language Models' hypothesis generation is shaped by task-specific model priors.<n> removing demonstrations results in minimal loss of hypothesis quality and downstream usage.<n>These insights advance our understanding of the dynamics of hypothesis generation in LLMs.
arXiv Detail & Related papers (2024-12-18T09:22:08Z) - Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement [92.61557711360652]
Language models (LMs) often fall short on inductive reasoning, despite achieving impressive success on research benchmarks.
We conduct a systematic study of the inductive reasoning capabilities of LMs through iterative hypothesis refinement.
We reveal several discrepancies between the inductive reasoning processes of LMs and humans, shedding light on both the potentials and limitations of using LMs in inductive reasoning tasks.
arXiv Detail & Related papers (2023-10-12T17:51:10Z) - Hypothesis Search: Inductive Reasoning with Language Models [39.03846394586811]
Recent work evaluates large language models on inductive reasoning tasks by directly prompting them yielding "in context learning"
This works well for straightforward inductive tasks but performs poorly on complex tasks such as the Abstraction and Reasoning Corpus (ARC)
In this work, we propose to improve the inductive reasoning ability of LLMs by generating explicit hypotheses at multiple levels of abstraction.
arXiv Detail & Related papers (2023-09-11T17:56:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.