Text Understanding in GPT-4 vs Humans
- URL: http://arxiv.org/abs/2403.17196v4
- Date: Fri, 17 Jan 2025 20:51:07 GMT
- Title: Text Understanding in GPT-4 vs Humans
- Authors: Thomas R. Shultz, Jamie M. Wise, Ardavan Salehi Nobandegani,
- Abstract summary: We examine whether a leading AI system GPT4 understands text as well as humans do.
We first use a well-established standardized test of discourse comprehension.
Next, we use more difficult passages to determine whether that could allow larger differences between GPT4 and humans.
- Score: 2.024925013349319
- License:
- Abstract: We examine whether a leading AI system GPT4 understands text as well as humans do, first using a well-established standardized test of discourse comprehension. On this test, GPT4 performs slightly, but not statistically significantly, better than humans given the very high level of human performance. Both GPT4 and humans make correct inferences about information that is not explicitly stated in the text, a critical test of understanding. Next, we use more difficult passages to determine whether that could allow larger differences between GPT4 and humans. GPT4 does considerably better on this more difficult text than do the high school and university students for whom these the text passages are designed, as admission tests of student reading comprehension. Deeper exploration of GPT4 performance on material from one of these admission tests reveals generally accepted signatures of genuine understanding, namely generalization and inference.
Related papers
- ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability [62.285407189502216]
Detecting texts generated by Large Language Models (LLMs) could cause grave mistakes due to incorrect decisions.
We introduce ExaGPT, an interpretable detection approach grounded in the human decision-making process.
We show that ExaGPT massively outperforms prior powerful detectors by up to +40.9 points of accuracy at a false positive rate of 1%.
arXiv Detail & Related papers (2025-02-17T01:15:07Z) - Generative AI Takes a Statistics Exam: A Comparison of Performance between ChatGPT3.5, ChatGPT4, and ChatGPT4o-mini [0.0]
We investigate the performance of GPT versions 3.5, 4.0, and 4o-mini on the same 16-question statistics exam given to a class of first-year graduate students.
Results indicate that GPT3.5 and 4o-mini have characteristics that are more similar than either of them have with GPT4.
arXiv Detail & Related papers (2025-01-15T21:46:01Z) - Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams [48.99818550820575]
We leverage state-of-the-art multi-modal AI models, in particular GPT-4o, to automatically grade handwritten responses to college-level math exams.
Using real student responses to questions in a probability theory exam, we evaluate GPT-4o's alignment with ground-truth scores from human graders using various prompting techniques.
arXiv Detail & Related papers (2024-11-07T22:51:47Z) - GPT-4o System Card [211.87336862081963]
GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video.
It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network.
It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages.
arXiv Detail & Related papers (2024-10-25T17:43:01Z) - Evaluating the capability of large language models to personalize science texts for diverse middle-school-age learners [0.0]
GPT-4 was used to profile student learning preferences based on choices made during a training session.
For the experimental group, GPT-4 was used to rewrite science texts to align with the student's predicted profile while, for students in the control group, texts were rewritten to contradict their learning preferences.
arXiv Detail & Related papers (2024-08-09T17:53:35Z) - "ChatGPT Is Here to Help, Not to Replace Anybody" -- An Evaluation of Students' Opinions On Integrating ChatGPT In CS Courses [0.0]
Large Language Models (LLMs) like GPT and Bard are capable of producing code based on textual descriptions.
LLMs will have profound implications for computing education, raising concerns about cheating, excessive dependence, and a decline in computational thinking skills.
arXiv Detail & Related papers (2024-04-26T14:29:16Z) - GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition? [82.40761196684524]
This paper centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks.
We conduct extensive experiments to evaluate GPT-4's performance across images, videos, and point clouds.
Our findings show that GPT-4, enhanced with rich linguistic descriptions, significantly improves zero-shot recognition.
arXiv Detail & Related papers (2023-11-27T11:29:10Z) - An Early Evaluation of GPT-4V(ision) [40.866323649060696]
We evaluate different abilities of GPT-4V including visual understanding, language understanding, visual puzzle solving, and understanding of other modalities such as depth, thermal, video, and audio.
To estimate GPT-4V's performance, we manually construct 656 test instances and carefully evaluate the results of GPT-4V.
arXiv Detail & Related papers (2023-10-25T10:33:17Z) - Is GPT-4 a Good Data Analyst? [67.35956981748699]
We consider GPT-4 as a data analyst to perform end-to-end data analysis with databases from a wide range of domains.
We design several task-specific evaluation metrics to systematically compare the performance between several professional human data analysts and GPT-4.
Experimental results show that GPT-4 can achieve comparable performance to humans.
arXiv Detail & Related papers (2023-05-24T11:26:59Z) - Sparks of Artificial General Intelligence: Early experiments with GPT-4 [66.1188263570629]
GPT-4, developed by OpenAI, was trained using an unprecedented scale of compute and data.
We demonstrate that GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more.
We believe GPT-4 could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.
arXiv Detail & Related papers (2023-03-22T16:51:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.