Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks
- URL: http://arxiv.org/abs/2311.09247v3
- Date: Mon, 11 Dec 2023 23:57:17 GMT
- Title: Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks
- Authors: Melanie Mitchell, Alessandro B. Palmarini, Arseny Moskvichev
- Abstract summary: We evaluate the reasoning abilities of text-only and multimodal versions of GPT-4.
Our experimental results support the conclusion that neither version of GPT-4 has developed robust abstraction abilities at humanlike levels.
- Score: 53.936643052339
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explore the abstract reasoning abilities of text-only and multimodal
versions of GPT-4, using the ConceptARC benchmark [10], which is designed to
evaluate robust understanding and reasoning with core-knowledge concepts. We
extend the work of Moskvichev et al. [10] by evaluating GPT-4 on more detailed,
one-shot prompting (rather than simple, zero-shot prompts) with text versions
of ConceptARC tasks, and by evaluating GPT-4V, the multimodal version of GPT-4,
on zero- and one-shot prompts using image versions of the simplest tasks. Our
experimental results support the conclusion that neither version of GPT-4 has
developed robust abstraction abilities at humanlike levels.
Related papers
- Notes on Applicability of GPT-4 to Document Understanding [0.0]
We evaluate all publicly available GPT-4 family models concerning the Document Understanding field.
Benchmark results indicate that though it is hard to achieve satisfactory results with text-only models, GPT-4 Vision Turbo performs well when one provides both text recognized by an external OCR engine and document images on the input.
arXiv Detail & Related papers (2024-05-28T17:59:53Z) - Exploiting GPT-4 Vision for Zero-shot Point Cloud Understanding [114.4754255143887]
We tackle the challenge of classifying the object category in point clouds.
We employ GPT-4 Vision (GPT-4V) to overcome these challenges.
We set a new benchmark in zero-shot point cloud classification.
arXiv Detail & Related papers (2024-01-15T10:16:44Z) - GPT-4V with Emotion: A Zero-shot Benchmark for Generalized Emotion Recognition [38.2581985358104]
GPT-4 with Vision (GPT-4V) has demonstrated remarkable visual capabilities across various tasks, but its performance in emotion recognition has not been fully evaluated.
We present the quantitative evaluation results of GPT-4V on 21 benchmark datasets covering 6 tasks.
arXiv Detail & Related papers (2023-12-07T13:27:37Z) - GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition? [82.40761196684524]
This paper centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks.
We conduct extensive experiments to evaluate GPT-4's performance across images, videos, and point clouds.
Our findings show that GPT-4, enhanced with rich linguistic descriptions, significantly improves zero-shot recognition.
arXiv Detail & Related papers (2023-11-27T11:29:10Z) - GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks [70.98062518872999]
We validate GPT-4V's capabilities for evaluation purposes, addressing tasks ranging from foundational image-to-text and text-to-image synthesis to high-level image-to-image translations and multi-images to text alignment.
Notably, GPT-4V shows promising agreement with humans across various tasks and evaluation methods, demonstrating immense potential for multi-modal LLMs as evaluators.
arXiv Detail & Related papers (2023-11-02T16:11:09Z) - An Early Evaluation of GPT-4V(ision) [40.866323649060696]
We evaluate different abilities of GPT-4V including visual understanding, language understanding, visual puzzle solving, and understanding of other modalities such as depth, thermal, video, and audio.
To estimate GPT-4V's performance, we manually construct 656 test instances and carefully evaluate the results of GPT-4V.
arXiv Detail & Related papers (2023-10-25T10:33:17Z) - The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) [121.42924593374127]
We analyze the latest model, GPT-4V, to deepen the understanding of LMMs.
GPT-4V's unprecedented ability in processing arbitrarily interleaved multimodal inputs makes it a powerful multimodal generalist system.
GPT-4V's unique capability of understanding visual markers drawn on input images can give rise to new human-computer interaction methods.
arXiv Detail & Related papers (2023-09-29T17:34:51Z) - GPT-4 Technical Report [116.90398195245983]
GPT-4 is a large-scale, multimodal model which can accept image and text inputs and produce text outputs.
It exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers.
arXiv Detail & Related papers (2023-03-15T17:15:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.