Large Language Models Are Not Strong Abstract Reasoners
- URL: http://arxiv.org/abs/2305.19555v3
- Date: Tue, 2 Jan 2024 22:30:00 GMT
- Title: Large Language Models Are Not Strong Abstract Reasoners
- Authors: Ga\"el Gendron, Qiming Bao, Michael Witbrock, Gillian Dobbie
- Abstract summary: Large Language Models have shown tremendous performance on a variety of natural language processing tasks.
It is unclear whether LLMs can achieve human-like cognitive capabilities or whether these models are still fundamentally circumscribed.
We introduce a new benchmark for evaluating language models beyond memorization on abstract reasoning tasks.
- Score: 12.354660792999269
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models have shown tremendous performance on a large variety of
natural language processing tasks, ranging from text comprehension to common
sense reasoning. However, the mechanisms responsible for this success remain
opaque, and it is unclear whether LLMs can achieve human-like cognitive
capabilities or whether these models are still fundamentally circumscribed.
Abstract reasoning is a fundamental task for cognition, consisting of finding
and applying a general pattern from few data. Evaluating deep neural
architectures on this task could give insight into their potential limitations
regarding reasoning and their broad generalisation abilities, yet this is
currently an under-explored area. In this paper, we introduce a new benchmark
for evaluating language models beyond memorization on abstract reasoning tasks.
We perform extensive evaluations of state-of-the-art LLMs, showing that they
currently achieve very limited performance in contrast with other natural
language tasks, even when applying techniques that have been shown to improve
performance on other NLP tasks. We argue that guiding LLM generation to follow
causal paths could help improve the generalisation and reasoning abilities of
LLMs.
Related papers
- Meaningful Learning: Enhancing Abstract Reasoning in Large Language Models via Generic Fact Guidance [38.49506722997423]
Large language models (LLMs) have developed impressive performance and strong explainability across various reasoning scenarios.
LLMs often struggle to abstract and apply the generic fact to provide consistent and precise answers.
This has sparked a vigorous debate about whether LLMs are genuinely reasoning or merely memorizing.
arXiv Detail & Related papers (2024-03-14T04:06:13Z) - FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition [56.76951887823882]
Large language models (LLMs) are primarily evaluated by overall performance on various text understanding and generation tasks.
We present FAC$2$E, a framework for Fine-grAined and Cognition-grounded LLMs' Capability Evaluation.
arXiv Detail & Related papers (2024-02-29T21:05:37Z) - Rethinking Interpretability in the Era of Large Language Models [76.1947554386879]
Large language models (LLMs) have demonstrated remarkable capabilities across a wide array of tasks.
The capability to explain in natural language allows LLMs to expand the scale and complexity of patterns that can be given to a human.
These new capabilities raise new challenges, such as hallucinated explanations and immense computational costs.
arXiv Detail & Related papers (2024-01-30T17:38:54Z) - Learning Shortcuts: On the Misleading Promise of NLU in Language Models [4.8951183832371]
Large language models (LLMs) have enabled significant performance gains in the field of natural language processing.
Recent studies have found that LLMs often resort to shortcuts when performing tasks, creating an illusion of enhanced performance while lacking generalizability in their decision rules.
arXiv Detail & Related papers (2024-01-17T21:55:15Z) - LLMs for Relational Reasoning: How Far are We? [8.840750655261251]
Large language models (LLMs) have revolutionized many areas by achieving state-of-the-art performance on downstream tasks.
Recent efforts have demonstrated that the LLMs are poor at solving sequential decision-making problems.
arXiv Detail & Related papers (2024-01-17T08:22:52Z) - Are Large Language Models Good Fact Checkers: A Preliminary Study [26.023148371263012]
Large Language Models (LLMs) have drawn significant attention due to their outstanding reasoning capabilities and extensive knowledge repository.
This study aims to comprehensively evaluate various LLMs in tackling specific fact-checking subtasks.
arXiv Detail & Related papers (2023-11-29T05:04:52Z) - Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems.
LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning.
We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - Large Language Models are In-Context Semantic Reasoners rather than
Symbolic Reasoners [75.85554779782048]
Large Language Models (LLMs) have excited the natural language and machine learning community over recent years.
Despite of numerous successful applications, the underlying mechanism of such in-context capabilities still remains unclear.
In this work, we hypothesize that the learned textitsemantics of language tokens do the most heavy lifting during the reasoning process.
arXiv Detail & Related papers (2023-05-24T07:33:34Z) - Shortcut Learning of Large Language Models in Natural Language
Understanding [119.45683008451698]
Large language models (LLMs) have achieved state-of-the-art performance on a series of natural language understanding tasks.
They might rely on dataset bias and artifacts as shortcuts for prediction.
This has significantly affected their generalizability and adversarial robustness.
arXiv Detail & Related papers (2022-08-25T03:51:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.