Related papers: Emergent Analogical Reasoning in Large Language Models

Emergent Analogical Reasoning in Large Language Models

URL: http://arxiv.org/abs/2212.09196v3
Date: Thu, 3 Aug 2023 03:44:47 GMT
Title: Emergent Analogical Reasoning in Large Language Models
Authors: Taylor Webb, Keith J. Holyoak, Hongjing Lu
Abstract summary: We show that GPT-3 has a surprisingly strong capacity for abstract pattern induction, matching or even surpassing human capabilities in most settings. Our results indicate that large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems.
Score: 1.5469452301122177
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The recent advent of large language models has reinvigorated debate over whether human cognitive capacities might emerge in such generic models given sufficient training data. Of particular interest is the ability of these models to reason about novel problems zero-shot, without any direct training. In human cognition, this capacity is closely tied to an ability to reason by analogy. Here, we performed a direct comparison between human reasoners and a large language model (the text-davinci-003 variant of GPT-3) on a range of analogical tasks, including a non-visual matrix reasoning task based on the rule structure of Raven's Standard Progressive Matrices. We found that GPT-3 displayed a surprisingly strong capacity for abstract pattern induction, matching or even surpassing human capabilities in most settings; preliminary tests of GPT-4 indicated even better performance. Our results indicate that large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems.

Related papers

The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles [29.214813685163218]
OpenAI's releases of o1 and o3 mark a paradigm shift in Large Language Models towards advanced reasoning capabilities. We track the evolution of the GPT-[n] and o-[n] series models on challenging multimodal puzzles. The superior performance of o1 comes at nearly 750 times the computational cost of GPT-4o, raising concerns about its efficiency.
arXiv Detail & Related papers (2025-02-03T05:47:04Z)
Large Language Models Still Face Challenges in Multi-Hop Reasoning with External Knowledge [0.5439020425819]
We test the GPT-3.5 model on four reasoning benchmarks with Chain-of-Thought prompting (and its variations)<n>Our results reveal that despite the amazing performance achieved by large language models on various reasoning tasks, models still suffer from severe drawbacks which shows a large gap with humans.
arXiv Detail & Related papers (2024-12-11T11:53:26Z)
Evaluating the Robustness of Analogical Reasoning in Large Language Models [6.5855735579366685]
We investigate the robustness of analogy-making abilities previously claimed for LLMs. We test humans and GPT models on robustness to variants of the original analogy problems. Unlike humans, the performance of GPT models are susceptible to answer-order effects.
arXiv Detail & Related papers (2024-11-21T15:25:08Z)
Using Counterfactual Tasks to Evaluate the Generality of Analogical Reasoning in Large Language Models [7.779982757267302]
We investigate the generality of analogy-making abilities previously claimed for large language models (LLMs) We show that while the performance of humans remains high for all the problems, the GPT models' performance declines sharply on the counterfactual set.
arXiv Detail & Related papers (2024-02-14T05:52:23Z)
Probing the Creativity of Large Language Models: Can models produce divergent semantic association? [9.992602859777689]
The present study aims to investigate the creative thinking of large language models through a cognitive perspective. We utilize the divergent association task ( DAT), an objective measurement of creativity that asks models to generate unrelated words and calculates the semantic distance between them. Our results imply that advanced large language models have divergent semantic associations, which is a fundamental process underlying creativity.
arXiv Detail & Related papers (2023-10-17T11:23:32Z)
Response: Emergent analogical reasoning in large language models [0.034530027457862]
GPT-3 fails to solve simplest variations of the original tasks, whereas human performance remains consistently high across all modified versions. To strengthen claims of humanlike reasoning such as zero-shot reasoning, it is important to develop approaches that rule out data memorization.
arXiv Detail & Related papers (2023-08-30T16:17:26Z)
Inductive reasoning in humans and large language models [0.0]
We apply GPT-3.5 and GPT-4 to a classic problem in human inductive reasoning known as property induction. Although GPT-3.5 struggles to capture many aspects of human behaviour, GPT-4 is much more successful.
arXiv Detail & Related papers (2023-06-11T00:23:25Z)
A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models [71.42197262495056]
GPT series models have gained considerable attention due to their exceptional natural language processing capabilities. We select six representative models, comprising two GPT-3 series models and four GPT-3.5 series models. We evaluate their performance on nine natural language understanding (NLU) tasks using 21 datasets. Our experiments reveal that the overall ability of GPT series models on NLU tasks does not increase gradually as the models evolve.
arXiv Detail & Related papers (2023-03-18T14:02:04Z)
Training Trajectories of Language Models Across Scales [99.38721327771208]
Scaling up language models has led to unprecedented performance gains. How do language models of different sizes learn during pre-training? Why do larger language models demonstrate more desirable behaviors?
arXiv Detail & Related papers (2022-12-19T19:16:29Z)
A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models [81.15974174627785]
We study the behavior of language models in terms of robustness and sensitivity to direct interventions in the input space. Our analysis shows that robustness does not appear to continuously improve as a function of size, but the GPT-3 Davinci models (175B) achieve a dramatic improvement in both robustness and sensitivity compared to all other GPT variants.
arXiv Detail & Related papers (2022-10-21T15:12:37Z)
Elaboration-Generating Commonsense Question Answering at Scale [77.96137534751445]
In question answering requiring common sense, language models (e.g., GPT-3) have been used to generate text expressing background knowledge. We finetune smaller language models to generate useful intermediate context, referred to here as elaborations. Our framework alternates between updating two language models -- an elaboration generator and an answer predictor -- allowing each to influence the other.
arXiv Detail & Related papers (2022-09-02T18:32:09Z)
Chain of Thought Prompting Elicits Reasoning in Large Language Models [56.811278668446825]
This paper explores the ability of language models to generate a coherent chain of thought. Experiments show that inducing a chain of thought via prompting can enable sufficiently large language models to better perform reasoning tasks.
arXiv Detail & Related papers (2022-01-28T02:33:07Z)
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts [84.33607245023049]
We propose and develop a family of language models named GLaM (Generalist Language Model) GLaM uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants. It consumes only 1/3 of the energy used to train GPT-3 and requires half of the flops for inference, while still achieving better overall zero-shot and one-shot performance across 29 NLP tasks.
arXiv Detail & Related papers (2021-12-13T18:58:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.