Response: Emergent analogical reasoning in large language models
- URL: http://arxiv.org/abs/2308.16118v2
- Date: Wed, 1 May 2024 00:24:17 GMT
- Title: Response: Emergent analogical reasoning in large language models
- Authors: Damian Hodel, Jevin West,
- Abstract summary: GPT-3 fails to solve simplest variations of the original tasks, whereas human performance remains consistently high across all modified versions.
To strengthen claims of humanlike reasoning such as zero-shot reasoning, it is important to develop approaches that rule out data memorization.
- Score: 0.034530027457862
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In their recent Nature Human Behaviour paper, "Emergent analogical reasoning in large language models," (Webb, Holyoak, and Lu, 2023) the authors argue that "large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems." In this response, we provide counterexamples of the letter string analogies. In our tests, GPT-3 fails to solve simplest variations of the original tasks, whereas human performance remains consistently high across all modified versions. Zero-shot reasoning is an extraordinary claim that requires extraordinary evidence. We do not see that evidence in our experiments. To strengthen claims of humanlike reasoning such as zero-shot reasoning, it is important that the field develop approaches that rule out data memorization.
Related papers
- Evaluating the Robustness of Analogical Reasoning in Large Language Models [6.5855735579366685]
We investigate the robustness of analogy-making abilities previously claimed for LLMs.
We test humans and GPT models on robustness to variants of the original analogy problems.
Unlike humans, the performance of GPT models are susceptible to answer-order effects.
arXiv Detail & Related papers (2024-11-21T15:25:08Z) - Abstraction-of-Thought Makes Language Models Better Reasoners [79.72672444664376]
We introduce a novel structured reasoning format called Abstraction-of-Thought (AoT)
The uniqueness of AoT lies in its explicit requirement for varying levels of abstraction within the reasoning process.
We present AoT Collection, a generic finetuning dataset consisting of 348k high-quality samples with AoT reasoning processes.
arXiv Detail & Related papers (2024-06-18T09:46:44Z) - Using Counterfactual Tasks to Evaluate the Generality of Analogical
Reasoning in Large Language Models [7.779982757267302]
We investigate the generality of analogy-making abilities previously claimed for large language models (LLMs)
We show that while the performance of humans remains high for all the problems, the GPT models' performance declines sharply on the counterfactual set.
arXiv Detail & Related papers (2024-02-14T05:52:23Z) - UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations [62.71847873326847]
We investigate the ability to model unusual, unexpected, and unlikely situations.
Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate an explanation.
We release a new English language corpus called UNcommonsense.
arXiv Detail & Related papers (2023-11-14T19:00:55Z) - Emergent Analogical Reasoning in Large Language Models [1.5469452301122177]
We show that GPT-3 has a surprisingly strong capacity for abstract pattern induction, matching or even surpassing human capabilities in most settings.
Our results indicate that large language models such as GPT-3 have acquired an emergent ability to find zero-shot solutions to a broad range of analogy problems.
arXiv Detail & Related papers (2022-12-19T00:04:56Z) - MetaLogic: Logical Reasoning Explanations with Fine-Grained Structure [129.8481568648651]
We propose a benchmark to investigate models' logical reasoning capabilities in complex real-life scenarios.
Based on the multi-hop chain of reasoning, the explanation form includes three main components.
We evaluate the current best models' performance on this new explanation form.
arXiv Detail & Related papers (2022-10-22T16:01:13Z) - A Causal Framework to Quantify the Robustness of Mathematical Reasoning
with Language Models [81.15974174627785]
We study the behavior of language models in terms of robustness and sensitivity to direct interventions in the input space.
Our analysis shows that robustness does not appear to continuously improve as a function of size, but the GPT-3 Davinci models (175B) achieve a dramatic improvement in both robustness and sensitivity compared to all other GPT variants.
arXiv Detail & Related papers (2022-10-21T15:12:37Z) - Making Large Language Models Better Reasoners with Step-Aware Verifier [49.16750018427259]
DIVERSE (Diverse Verifier on Reasoning Step) is a novel approach that further enhances the reasoning capability of language models.
We evaluate DIVERSE on the latest language model code-davinci and show that it achieves new state-of-the-art results on six of eight reasoning benchmarks.
arXiv Detail & Related papers (2022-06-06T03:38:36Z) - Chain of Thought Prompting Elicits Reasoning in Large Language Models [56.811278668446825]
This paper explores the ability of language models to generate a coherent chain of thought.
Experiments show that inducing a chain of thought via prompting can enable sufficiently large language models to better perform reasoning tasks.
arXiv Detail & Related papers (2022-01-28T02:33:07Z) - Critical Thinking for Language Models [6.963299759354333]
This paper takes a first step towards a critical thinking curriculum for neural auto-regressive language models.
We generate artificial argumentative texts to train and evaluate GPT-2.
We obtain consistent and promising results for NLU benchmarks.
arXiv Detail & Related papers (2020-09-15T15:49:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.