Is GPT-4 a Good Data Analyst?
- URL: http://arxiv.org/abs/2305.15038v2
- Date: Mon, 23 Oct 2023 02:10:58 GMT
- Title: Is GPT-4 a Good Data Analyst?
- Authors: Liying Cheng, Xingxuan Li, Lidong Bing
- Abstract summary: We consider GPT-4 as a data analyst to perform end-to-end data analysis with databases from a wide range of domains.
We design several task-specific evaluation metrics to systematically compare the performance between several professional human data analysts and GPT-4.
Experimental results show that GPT-4 can achieve comparable performance to humans.
- Score: 67.35956981748699
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As large language models (LLMs) have demonstrated their powerful capabilities
in plenty of domains and tasks, including context understanding, code
generation, language generation, data storytelling, etc., many data analysts
may raise concerns if their jobs will be replaced by artificial intelligence
(AI). This controversial topic has drawn great attention in public. However, we
are still at a stage of divergent opinions without any definitive conclusion.
Motivated by this, we raise the research question of "is GPT-4 a good data
analyst?" in this work and aim to answer it by conducting head-to-head
comparative studies. In detail, we regard GPT-4 as a data analyst to perform
end-to-end data analysis with databases from a wide range of domains. We
propose a framework to tackle the problems by carefully designing the prompts
for GPT-4 to conduct experiments. We also design several task-specific
evaluation metrics to systematically compare the performance between several
professional human data analysts and GPT-4. Experimental results show that
GPT-4 can achieve comparable performance to humans. We also provide in-depth
discussions about our results to shed light on further studies before reaching
the conclusion that GPT-4 can replace data analysts.
Related papers
- Evaluating GPT-4 at Grading Handwritten Solutions in Math Exams [48.99818550820575]
We leverage state-of-the-art multi-modal AI models, in particular GPT-4o, to automatically grade handwritten responses to college-level math exams.
Using real student responses to questions in a probability theory exam, we evaluate GPT-4o's alignment with ground-truth scores from human graders using various prompting techniques.
arXiv Detail & Related papers (2024-11-07T22:51:47Z) - Decoding AI: The inside story of data analysis in ChatGPT [0.0]
This review critically examines the Data Analysis capabilities of ChatGPT assessing its performance across a wide range of tasks.
While DA provides researchers and practitioners with unprecedented analytical capabilities, it is far from being perfect, and it is important to recognize and address its limitations.
arXiv Detail & Related papers (2024-04-12T13:57:30Z) - GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition? [82.40761196684524]
This paper centers on the evaluation of GPT-4's linguistic and visual capabilities in zero-shot visual recognition tasks.
We conduct extensive experiments to evaluate GPT-4's performance across images, videos, and point clouds.
Our findings show that GPT-4, enhanced with rich linguistic descriptions, significantly improves zero-shot recognition.
arXiv Detail & Related papers (2023-11-27T11:29:10Z) - Is GPT4 a Good Trader? [12.057320450155835]
Large language models (LLMs) have demonstrated significant capabilities in various planning and reasoning tasks.
This study aims to examine the fidelity of GPT-4's comprehension of classic trading theories and its proficiency in applying its code interpreter abilities to real-world trading data analysis.
arXiv Detail & Related papers (2023-09-20T00:47:52Z) - Large Language Models on Wikipedia-Style Survey Generation: an Evaluation in NLP Concepts [21.150221839202878]
Large Language Models (LLMs) have achieved significant success across various general tasks.
In this work, we examine the proficiency of LLMs in generating succinct survey articles specific to the niche field of NLP in computer science.
We compare both human and GPT-based evaluation scores and provide in-depth analysis.
arXiv Detail & Related papers (2023-08-21T01:32:45Z) - Can GPT-4 Support Analysis of Textual Data in Tasks Requiring Highly
Specialized Domain Expertise? [0.8924669503280334]
GPT-4, prompted with annotation guidelines, performs on par with well-trained law student annotators.
We demonstrated how to analyze GPT-4's predictions to identify and mitigate deficiencies in annotation guidelines.
arXiv Detail & Related papers (2023-06-24T08:48:24Z) - Exploring the Trade-Offs: Unified Large Language Models vs Local
Fine-Tuned Models for Highly-Specific Radiology NLI Task [49.50140712943701]
We evaluate the performance of ChatGPT/GPT-4 on a radiology NLI task and compare it to other models fine-tuned specifically on task-related data samples.
We also conduct a comprehensive investigation on ChatGPT/GPT-4's reasoning ability by introducing varying levels of inference difficulty.
arXiv Detail & Related papers (2023-04-18T17:21:48Z) - Sparks of Artificial General Intelligence: Early experiments with GPT-4 [66.1188263570629]
GPT-4, developed by OpenAI, was trained using an unprecedented scale of compute and data.
We demonstrate that GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more.
We believe GPT-4 could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.
arXiv Detail & Related papers (2023-03-22T16:51:28Z) - GPT-4 Technical Report [116.90398195245983]
GPT-4 is a large-scale, multimodal model which can accept image and text inputs and produce text outputs.
It exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers.
arXiv Detail & Related papers (2023-03-15T17:15:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.