Inductive reasoning in humans and large language models
- URL: http://arxiv.org/abs/2306.06548v3
- Date: Fri, 4 Aug 2023 01:36:56 GMT
- Title: Inductive reasoning in humans and large language models
- Authors: Simon J. Han, Keith Ransom, Andrew Perfors, Charles Kemp
- Abstract summary: We apply GPT-3.5 and GPT-4 to a classic problem in human inductive reasoning known as property induction.
Although GPT-3.5 struggles to capture many aspects of human behaviour, GPT-4 is much more successful.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The impressive recent performance of large language models has led many to
wonder to what extent they can serve as models of general intelligence or are
similar to human cognition. We address this issue by applying GPT-3.5 and GPT-4
to a classic problem in human inductive reasoning known as property induction.
Over two experiments, we elicit human judgments on a range of property
induction tasks spanning multiple domains. Although GPT-3.5 struggles to
capture many aspects of human behaviour, GPT-4 is much more successful: for the
most part, its performance qualitatively matches that of humans, and the only
notable exception is its failure to capture the phenomenon of premise
non-monotonicity. Our work demonstrates that property induction allows for
interesting comparisons between human and machine intelligence and provides two
large datasets that can serve as benchmarks for future work in this vein.
Related papers
- Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse [9.542503507653494]
Chain-of-thought (CoT) has become a widely used strategy for working with large language and multimodal models.
We identify characteristics of tasks where CoT reduces performance by drawing inspiration from cognitive psychology.
We find that a diverse collection of state-of-the-art models exhibit significant drop-offs in performance when using inference-time reasoning.
arXiv Detail & Related papers (2024-10-27T18:30:41Z) - Metacognitive Monitoring: A Human Ability Beyond Generative Artificial Intelligence [0.0]
Large language models (LLMs) have shown impressive alignment with human cognitive processes.
This study investigates whether ChatGPT possess metacognitive monitoring abilities akin to humans.
arXiv Detail & Related papers (2024-10-17T09:42:30Z) - Human vs. LMMs: Exploring the Discrepancy in Emoji Interpretation and Usage in Digital Communication [68.40865217231695]
This study examines the behavior of GPT-4V in replicating human-like use of emojis.
The findings reveal a discernible discrepancy between human and GPT-4V behaviors, likely due to the subjective nature of human interpretation.
arXiv Detail & Related papers (2024-01-16T08:56:52Z) - UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations [62.71847873326847]
We investigate the ability to model unusual, unexpected, and unlikely situations.
Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate an explanation.
We release a new English language corpus called UNcommonsense.
arXiv Detail & Related papers (2023-11-14T19:00:55Z) - The Generative AI Paradox: "What It Can Create, It May Not Understand" [81.89252713236746]
Recent wave of generative AI has sparked excitement and concern over potentially superhuman levels of artificial intelligence.
At the same time, models still show basic errors in understanding that would not be expected even in non-expert humans.
This presents us with an apparent paradox: how do we reconcile seemingly superhuman capabilities with the persistence of errors that few humans would make?
arXiv Detail & Related papers (2023-10-31T18:07:07Z) - LLM Cognitive Judgements Differ From Human [0.03626013617212666]
I examine GPT-3 and ChatGPT capabilities on a limited-data inductive reasoning task from the cognitive science literature.
The results suggest that these models' cognitive judgements are not human-like.
arXiv Detail & Related papers (2023-07-20T16:22:36Z) - Does Conceptual Representation Require Embodiment? Insights From Large
Language Models [9.390117546307042]
We compare representations of 4,442 lexical concepts between humans and ChatGPTs (GPT-3.5 and GPT-4)
We identify two main findings: 1) Both models strongly align with human representations in non-sensorimotor domains but lag in sensory and motor areas, with GPT-4 outperforming GPT-3.5; 2) GPT-4's gains are associated with its additional visual learning, which also appears to benefit related dimensions like haptics and imageability.
arXiv Detail & Related papers (2023-05-30T15:06:28Z) - Sparks of Artificial General Intelligence: Early experiments with GPT-4 [66.1188263570629]
GPT-4, developed by OpenAI, was trained using an unprecedented scale of compute and data.
We demonstrate that GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more.
We believe GPT-4 could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.
arXiv Detail & Related papers (2023-03-22T16:51:28Z) - Emergent Analogical Reasoning in Large Language Models [1.5469452301122177]
We show that GPT-3 has a surprisingly strong capacity for abstract pattern induction, matching or even surpassing human capabilities in most settings.
Our results indicate that large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems.
arXiv Detail & Related papers (2022-12-19T00:04:56Z) - JECC: Commonsense Reasoning Tasks Derived from Interactive Fictions [75.42526766746515]
We propose a new commonsense reasoning dataset based on human's Interactive Fiction (IF) gameplay walkthroughs.
Our dataset focuses on the assessment of functional commonsense knowledge rules rather than factual knowledge.
Experiments show that the introduced dataset is challenging to previous machine reading models as well as the new large language models.
arXiv Detail & Related papers (2022-10-18T19:20:53Z) - Machine Common Sense [77.34726150561087]
Machine common sense remains a broad, potentially unbounded problem in artificial intelligence (AI)
This article deals with the aspects of modeling commonsense reasoning focusing on such domain as interpersonal interactions.
arXiv Detail & Related papers (2020-06-15T13:59:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.