Emergent Analogical Reasoning in Large Language Models
- URL: http://arxiv.org/abs/2212.09196v3
- Date: Thu, 3 Aug 2023 03:44:47 GMT
- Title: Emergent Analogical Reasoning in Large Language Models
- Authors: Taylor Webb, Keith J. Holyoak, Hongjing Lu
- Abstract summary: We show that GPT-3 has a surprisingly strong capacity for abstract pattern induction, matching or even surpassing human capabilities in most settings.
Our results indicate that large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems.
- Score: 1.5469452301122177
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent advent of large language models has reinvigorated debate over
whether human cognitive capacities might emerge in such generic models given
sufficient training data. Of particular interest is the ability of these models
to reason about novel problems zero-shot, without any direct training. In human
cognition, this capacity is closely tied to an ability to reason by analogy.
Here, we performed a direct comparison between human reasoners and a large
language model (the text-davinci-003 variant of GPT-3) on a range of analogical
tasks, including a non-visual matrix reasoning task based on the rule structure
of Raven's Standard Progressive Matrices. We found that GPT-3 displayed a
surprisingly strong capacity for abstract pattern induction, matching or even
surpassing human capabilities in most settings; preliminary tests of GPT-4
indicated even better performance. Our results indicate that large language
models such as GPT-3 have acquired an emergent ability to find zero-shot
solutions to a broad range of analogy problems.
Related papers
- Evaluating the Robustness of Analogical Reasoning in Large Language Models [6.5855735579366685]
We investigate the robustness of analogy-making abilities previously claimed for LLMs.
We test humans and GPT models on robustness to variants of the original analogy problems.
Unlike humans, the performance of GPT models are susceptible to answer-order effects.
arXiv Detail & Related papers (2024-11-21T15:25:08Z) - Using Counterfactual Tasks to Evaluate the Generality of Analogical
Reasoning in Large Language Models [7.779982757267302]
We investigate the generality of analogy-making abilities previously claimed for large language models (LLMs)
We show that while the performance of humans remains high for all the problems, the GPT models' performance declines sharply on the counterfactual set.
arXiv Detail & Related papers (2024-02-14T05:52:23Z) - Probing the Creativity of Large Language Models: Can models produce
divergent semantic association? [9.992602859777689]
The present study aims to investigate the creative thinking of large language models through a cognitive perspective.
We utilize the divergent association task ( DAT), an objective measurement of creativity that asks models to generate unrelated words and calculates the semantic distance between them.
Our results imply that advanced large language models have divergent semantic associations, which is a fundamental process underlying creativity.
arXiv Detail & Related papers (2023-10-17T11:23:32Z) - Response: Emergent analogical reasoning in large language models [0.034530027457862]
GPT-3 fails to solve simplest variations of the original tasks, whereas human performance remains consistently high across all modified versions.
To strengthen claims of humanlike reasoning such as zero-shot reasoning, it is important to develop approaches that rule out data memorization.
arXiv Detail & Related papers (2023-08-30T16:17:26Z) - Inductive reasoning in humans and large language models [0.0]
We apply GPT-3.5 and GPT-4 to a classic problem in human inductive reasoning known as property induction.
Although GPT-3.5 struggles to capture many aspects of human behaviour, GPT-4 is much more successful.
arXiv Detail & Related papers (2023-06-11T00:23:25Z) - A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models [71.42197262495056]
GPT series models have gained considerable attention due to their exceptional natural language processing capabilities.
We select six representative models, comprising two GPT-3 series models and four GPT-3.5 series models.
We evaluate their performance on nine natural language understanding (NLU) tasks using 21 datasets.
Our experiments reveal that the overall ability of GPT series models on NLU tasks does not increase gradually as the models evolve.
arXiv Detail & Related papers (2023-03-18T14:02:04Z) - Training Trajectories of Language Models Across Scales [99.38721327771208]
Scaling up language models has led to unprecedented performance gains.
How do language models of different sizes learn during pre-training?
Why do larger language models demonstrate more desirable behaviors?
arXiv Detail & Related papers (2022-12-19T19:16:29Z) - A Causal Framework to Quantify the Robustness of Mathematical Reasoning
with Language Models [81.15974174627785]
We study the behavior of language models in terms of robustness and sensitivity to direct interventions in the input space.
Our analysis shows that robustness does not appear to continuously improve as a function of size, but the GPT-3 Davinci models (175B) achieve a dramatic improvement in both robustness and sensitivity compared to all other GPT variants.
arXiv Detail & Related papers (2022-10-21T15:12:37Z) - Elaboration-Generating Commonsense Question Answering at Scale [77.96137534751445]
In question answering requiring common sense, language models (e.g., GPT-3) have been used to generate text expressing background knowledge.
We finetune smaller language models to generate useful intermediate context, referred to here as elaborations.
Our framework alternates between updating two language models -- an elaboration generator and an answer predictor -- allowing each to influence the other.
arXiv Detail & Related papers (2022-09-02T18:32:09Z) - Chain of Thought Prompting Elicits Reasoning in Large Language Models [56.811278668446825]
This paper explores the ability of language models to generate a coherent chain of thought.
Experiments show that inducing a chain of thought via prompting can enable sufficiently large language models to better perform reasoning tasks.
arXiv Detail & Related papers (2022-01-28T02:33:07Z) - GLaM: Efficient Scaling of Language Models with Mixture-of-Experts [84.33607245023049]
We propose and develop a family of language models named GLaM (Generalist Language Model)
GLaM uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants.
It consumes only 1/3 of the energy used to train GPT-3 and requires half of the flops for inference, while still achieving better overall zero-shot and one-shot performance across 29 NLP tasks.
arXiv Detail & Related papers (2021-12-13T18:58:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.