Related papers: Understanding Why ChatGPT Outperforms Humans in Visualization Design Advice

Understanding Why ChatGPT Outperforms Humans in Visualization Design Advice

URL: http://arxiv.org/abs/2508.01547v1
Date: Sun, 03 Aug 2025 02:14:00 GMT
Title: Understanding Why ChatGPT Outperforms Humans in Visualization Design Advice
Authors: Yongsu Ahn, Nam Wook Kim,
Abstract summary: We find that differences exist between two ChatGPT models and human outputs over rhetorical structure, knowledge breadth, and perceptual quality.<n>The two models were generally favored over human responses, while their strengths in coverage and breadth, and emphasis on technical and task-oriented visualization feedback collectively shaped higher overall quality.
Score: 9.847086877903948
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper investigates why recent generative AI models outperform humans in data visualization knowledge tasks. Through systematic comparative analysis of responses to visualization questions, we find that differences exist between two ChatGPT models and human outputs over rhetorical structure, knowledge breadth, and perceptual quality. Our findings reveal that ChatGPT-4, as a more advanced model, displays a hybrid of characteristics from both humans and ChatGPT-3.5. The two models were generally favored over human responses, while their strengths in coverage and breadth, and emphasis on technical and task-oriented visualization feedback collectively shaped higher overall quality. Based on our findings, we draw implications for advancing user experiences based on the potential of LLMs and human perception over their capabilities, with relevance to broader applications of AI.

Related papers

When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration [79.69935257008467]
We introduce Knowledge Integration and Transfer Evaluation (KITE), a conceptual and experimental framework for Human-AI knowledge transfer capabilities.<n>We conduct the first large-scale human study (N=118) explicitly designed to measure it.<n>In our two-phase setup, humans first ideate with an AI on problem-solving strategies, then independently implement solutions, isolating model explanations' influence on human understanding.
arXiv Detail & Related papers (2025-06-05T20:48:16Z)
Testing the Limits of Fine-Tuning for Improving Visual Cognition in Vision Language Models [51.58859621164201]
We introduce visual stimuli and human judgments on visual cognition tasks to evaluate performance across cognitive domains.<n>We fine-tune models on ground truth data for intuitive physics and causal reasoning.<n>We find that task-specific fine-tuning does not contribute to robust human-like generalization to data with other visual characteristics.
arXiv Detail & Related papers (2025-02-21T18:58:30Z)
Enhancing Human-Like Responses in Large Language Models [0.0]
We focus on techniques that enhance natural language understanding, conversational coherence, and emotional intelligence in AI systems.<n>The study evaluates various approaches, including fine-tuning with diverse datasets, incorporating psychological principles, and designing models that better mimic human reasoning patterns.
arXiv Detail & Related papers (2025-01-09T07:44:06Z)
LATTE: Learning to Think with Vision Specialists [103.5952731807559]
We propose LATTE, a family of vision-language models that offload perception to state-of-the-art vision models.<n>By offloading perception to state-of-the-art vision models, our approach enables vision-language models to focus solely on reasoning over high-quality perceptual information.
arXiv Detail & Related papers (2024-12-07T00:42:04Z)
The Cognitive Capabilities of Generative AI: A Comparative Analysis with Human Benchmarks [17.5336703613751]
This study benchmarks leading large language models and vision language models against human performance on the Wechsler Adult Intelligence Scale (WAIS-IV) Most models demonstrated exceptional capabilities in the storage, retrieval, and manipulation of tokens such as arbitrary sequences of letters and numbers. Despite these broad strengths, we observed consistently poor performance on the Perceptual Reasoning Index (PRI) from multimodal models.
arXiv Detail & Related papers (2024-10-09T19:22:26Z)
Assessing the Aesthetic Evaluation Capabilities of GPT-4 with Vision: Insights from Group and Individual Assessments [2.539875353011627]
This study investigates the performance of GPT-4 with Vision on the task of aesthetic evaluation of images. We employ two tasks, prediction of the average evaluation values of a group and an individual's evaluation values. Experimental results reveal GPT-4 with Vision's superior performance in predicting aesthetic evaluations and the nature of different responses to beauty and ugliness.
arXiv Detail & Related papers (2024-03-06T10:27:09Z)
Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases [98.35348038111508]
This paper presents an in-depth comparative study of two pioneering models: Google's Gemini and OpenAI's GPT-4V(ision) The core of our analysis delves into the distinct visual comprehension abilities of each model. Our findings illuminate the unique strengths and niches of both models.
arXiv Detail & Related papers (2023-12-22T18:59:58Z)
UniAR: A Unified model for predicting human Attention and Responses on visual content [12.281060227170792]
We propose UniAR -- a unified model of human attention and preference behavior across diverse visual content. We train UniAR on diverse public datasets spanning natural images, webpages, and graphic designs, and achieve SOTA performance on multiple benchmarks. Potential applications include providing instant feedback on the effectiveness of UIs/visual content, and enabling designers and content-creation models to optimize their creation for human-centric improvements.
arXiv Detail & Related papers (2023-12-15T19:57:07Z)
Inductive reasoning in humans and large language models [0.0]
We apply GPT-3.5 and GPT-4 to a classic problem in human inductive reasoning known as property induction. Although GPT-3.5 struggles to capture many aspects of human behaviour, GPT-4 is much more successful.
arXiv Detail & Related papers (2023-06-11T00:23:25Z)
Adaptive Contextual Perception: How to Generalize to New Backgrounds and Ambiguous Objects [75.15563723169234]
We investigate how vision models adaptively use context for out-of-distribution generalization. We show that models that excel in one setting tend to struggle in the other. To replicate the generalization abilities of biological vision, computer vision models must have factorized object vs. background representations.
arXiv Detail & Related papers (2023-06-09T15:29:54Z)
JECC: Commonsense Reasoning Tasks Derived from Interactive Fictions [75.42526766746515]
We propose a new commonsense reasoning dataset based on human's Interactive Fiction (IF) gameplay walkthroughs. Our dataset focuses on the assessment of functional commonsense knowledge rules rather than factual knowledge. Experiments show that the introduced dataset is challenging to previous machine reading models as well as the new large language models.
arXiv Detail & Related papers (2022-10-18T19:20:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.