LLMs and the Abstraction and Reasoning Corpus: Successes, Failures, and
the Importance of Object-based Representations
- URL: http://arxiv.org/abs/2305.18354v2
- Date: Wed, 14 Feb 2024 21:15:31 GMT
- Title: LLMs and the Abstraction and Reasoning Corpus: Successes, Failures, and
the Importance of Object-based Representations
- Authors: Yudong Xu, Wenhao Li, Pashootan Vaezipoor, Scott Sanner, Elias B.
Khalil
- Abstract summary: We show that GPT-4 is unable to "reason" perfectly within non-language domains such as the 1D-ARC or a simple ARC subset.
We propose an object-based representation that is obtained through an external tool, resulting in nearly doubling the performance on solved ARC tasks and near-perfect scores on the easier 1D-ARC.
- Score: 50.431003245201644
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Can a Large Language Model (LLM) solve simple abstract reasoning problems? We
explore this broad question through a systematic analysis of GPT on the
Abstraction and Reasoning Corpus (ARC), a representative benchmark of abstract
reasoning ability from limited examples in which solutions require some "core
knowledge" of concepts such as objects, goal states, counting, and basic
geometry. GPT-4 solves only 13/50 of the most straightforward ARC tasks when
using textual encodings for their two-dimensional input-output grids. Our
failure analysis reveals that GPT-4's capacity to identify objects and reason
about them is significantly influenced by the sequential nature of the text
that represents an object within a text encoding of a task. To test this
hypothesis, we design a new benchmark, the 1D-ARC, which consists of
one-dimensional (array-like) tasks that are more conducive to GPT-based
reasoning, and where it indeed performs better than on the (2D) ARC. To
alleviate this issue, we propose an object-based representation that is
obtained through an external tool, resulting in nearly doubling the performance
on solved ARC tasks and near-perfect scores on the easier 1D-ARC. Although the
state-of-the-art GPT-4 is unable to "reason" perfectly within non-language
domains such as the 1D-ARC or a simple ARC subset, our study reveals that the
use of object-based representations can significantly improve its reasoning
ability. Visualizations, GPT logs, and data are available at
https://khalil-research.github.io/LLM4ARC.
Related papers
- Tackling the Abstraction and Reasoning Corpus with Vision Transformers: the Importance of 2D Representation, Positions, and Objects [31.926206783846144]
We show that a Vision Transformer (ViT) fails dramatically on most ARC tasks even when trained on one million examples per task.
We propose ViTARC, a ViT-style architecture that unlocks some of the visual reasoning capabilities required by the ARC.
Our task-specific ViTARC models achieve a test solve rate close to 100% on more than half of the 400 public ARC tasks.
arXiv Detail & Related papers (2024-10-08T22:25:34Z) - Intelligence Analysis of Language Models [0.0]
We test the effectiveness of Large Language Models (LLMs) on the Abstraction and Reasoning Corpus (ARC) dataset.
This dataset serves as a representative benchmark for testing abstract reasoning abilities.
We investigate the application of the Chain-of-Thought (CoT) technique, aiming to determine its role in improving model performance.
arXiv Detail & Related papers (2024-07-20T13:48:16Z) - Generalized Planning for the Abstraction and Reasoning Corpus [10.377424252002795]
We introduce an ARC solver, Generalized Planning for Abstract Reasoning (GPAR)
It casts an ARC problem as a generalized planning (GP) problem, where a solution is formalized as a planning program with pointers.
We show how to scale up GP solvers via domain knowledge specific to ARC in the form of restrictions over the actions model, predicates, arguments and valid structure of planning programs.
arXiv Detail & Related papers (2024-01-15T02:25:00Z) - LogicAsker: Evaluating and Improving the Logical Reasoning Ability of Large Language Models [63.14196038655506]
We introduce LogicAsker, a novel approach for evaluating and enhancing the logical reasoning capabilities of large language models (LLMs)
Our methodology reveals significant gaps in LLMs' learning of logical rules, with identified reasoning failures ranging from 29% to 90% across different models.
We leverage these findings to construct targeted demonstration examples and fine-tune data, notably enhancing logical reasoning in models like GPT-4o by up to 5%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z) - LISA: Reasoning Segmentation via Large Language Model [68.24075852136761]
We propose a new segmentation task -- reasoning segmentation.
The task is designed to output a segmentation mask given a complex and implicit query text.
We present LISA: large Language Instructed Assistant, which inherits the language generation capabilities of multimodal Large Language Models.
arXiv Detail & Related papers (2023-08-01T17:50:17Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - An Approach to Solving the Abstraction and Reasoning Corpus (ARC)
Challenge [0.0]
GPT4 prompt is designed to be prompt engineered into performing an arbitrary task.
We give the model some human priors via text, along with some typical procedures for solving the ARC tasks.
We posit that when scaled to a multi-agent system with usage of past memory and equipped with an image interpretation tool via Visual Question Answering, we may actually be able to solve the majority of the ARC challenge.
arXiv Detail & Related papers (2023-06-06T10:08:12Z) - Graphs, Constraints, and Search for the Abstraction and Reasoning Corpus [19.27379168184259]
The Abstraction and Reasoning Corpus (ARC) aims at benchmarking the performance of general artificial intelligence algorithms.
The ARC's focus on broad generalization and few-shot learning has made it impossible to solve using pure machine learning.
We propose Abstract Reasoning with Graph Abstractions (ARGA), a new object-centric framework that first represents images using graphs and then performs a search for a correct program.
arXiv Detail & Related papers (2022-10-18T14:13:43Z) - Probing Linguistic Features of Sentence-Level Representations in Neural
Relation Extraction [80.38130122127882]
We introduce 14 probing tasks targeting linguistic properties relevant to neural relation extraction (RE)
We use them to study representations learned by more than 40 different encoder architecture and linguistic feature combinations trained on two datasets.
We find that the bias induced by the architecture and the inclusion of linguistic features are clearly expressed in the probing task performance.
arXiv Detail & Related papers (2020-04-17T09:17:40Z) - Instance-aware, Context-focused, and Memory-efficient Weakly Supervised
Object Detection [184.563345153682]
We develop an instance-aware and context-focused unified framework for weakly supervised learning.
It employs an instance-aware self-training algorithm and a learnable Concrete DropBlock while devising a memory-efficient sequential batch back-propagation.
Our proposed method state-of-the-art results on COCO ($12.1% AP$, $24.8% AP_50$), VOC 2007 ($54.9% AP$), and VOC 2012 ($52.1% AP$)
arXiv Detail & Related papers (2020-04-09T17:57:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.