Experimenting with ChatGPT for Spreadsheet Formula Generation: Evidence
of Risk in AI Generated Spreadsheets
- URL: http://arxiv.org/abs/2309.00095v1
- Date: Thu, 31 Aug 2023 19:31:32 GMT
- Title: Experimenting with ChatGPT for Spreadsheet Formula Generation: Evidence
of Risk in AI Generated Spreadsheets
- Authors: Simon Thorne
- Abstract summary: Large Language Models (LLM) have become sophisticated enough that complex computer programs can be created through interpretation of plain English sentences.
This paper presents a series of experiments with ChatGPT to explore the tool's ability to produce valid spreadsheet formulae.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLM) have become sophisticated enough that complex
computer programs can be created through interpretation of plain English
sentences and implemented in a variety of modern languages such as Python, Java
Script, C++ and Spreadsheets. These tools are powerful and relatively accurate
and therefore provide broad access to computer programming regardless of the
background or knowledge of the individual using them. This paper presents a
series of experiments with ChatGPT to explore the tool's ability to produce
valid spreadsheet formulae and related computational outputs in situations
where ChatGPT has to deduce, infer and problem solve the answer. The results
show that in certain circumstances, ChatGPT can produce correct spreadsheet
formulae with correct reasoning, deduction and inference. However, when
information is limited, uncertain or the problem is too complex, the accuracy
of ChatGPT breaks down as does its ability to reason, infer and deduce. This
can also result in false statements and "hallucinations" that all subvert the
process of creating spreadsheet formulae.
Related papers
- NL2Formula: Generating Spreadsheet Formulas from Natural Language
Queries [29.33149993368329]
This paper introduces a novel benchmark task called NL2Formula.
The aim is to generate executable formulas that are grounded on a spreadsheet table, given a Natural Language (NL) query as input.
We construct a comprehensive dataset consisting of 70,799 paired NL queries and corresponding spreadsheet formulas, covering 21,670 tables and 37 types of formula functions.
arXiv Detail & Related papers (2024-02-20T05:58:05Z) - Can ChatGPT support software verification? [0.9668407688201361]
We ask ChatGPT to annotate 106 C programs with loop invariants.
We check validity and usefulness of the generated invariants by passing them to two verifiers, Frama-C and CPAchecker.
Our evaluation shows that ChatGPT is able to produce valid and useful invariants allowing Frama-C to verify tasks that it could not solve before.
arXiv Detail & Related papers (2023-11-04T15:25:18Z) - ChatGPT and Excel -- trust, but verify [0.0]
This paper adopts a critical approach to ChatGPT, showing how its huge reach makes it a useful tool for people with simple requirements but a bad, even misleading guide to those with more complex problems which are more rarely present in the training data and even more rarely have straightforward solutions.
It concludes with a practical guide for how to add an Excelscript button, with system and user prompts, to the ChatGPT API into the Excel desktop environment, supported by a blog post giving the technical details for those interested.
arXiv Detail & Related papers (2023-08-31T20:21:02Z) - Unmasking the giant: A comprehensive evaluation of ChatGPT's proficiency in coding algorithms and data structures [0.6990493129893112]
We evaluate ChatGPT's ability to generate correct solutions to the problems fed to it, its code quality, and nature of run-time errors thrown by its code.
We look into patterns in the test cases passed in order to gain some insights into how wrong ChatGPT code is in these kinds of situations.
arXiv Detail & Related papers (2023-07-10T08:20:34Z) - Fact-Checking Complex Claims with Program-Guided Reasoning [99.7212240712869]
Program-Guided Fact-Checking (ProgramFC) is a novel fact-checking model that decomposes complex claims into simpler sub-tasks.
We first leverage the in-context learning ability of large language models to generate reasoning programs.
We execute the program by delegating each sub-task to the corresponding sub-task handler.
arXiv Detail & Related papers (2023-05-22T06:11:15Z) - ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large
Language Models in Multilingual Learning [70.57126720079971]
Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP)
This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources.
Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
arXiv Detail & Related papers (2023-04-12T05:08:52Z) - When do you need Chain-of-Thought Prompting for ChatGPT? [87.45382888430643]
Chain-of-Thought (CoT) prompting can effectively elicit complex multi-step reasoning from Large Language Models(LLMs)
It is not clear whether CoT is still effective on more recent instruction finetuned (IFT) LLMs such as ChatGPT.
arXiv Detail & Related papers (2023-04-06T17:47:29Z) - How Generative AI models such as ChatGPT can be (Mis)Used in SPC
Practice, Education, and Research? An Exploratory Study [2.0841728192954663]
Generative Artificial Intelligence (AI) models have the potential to revolutionize Statistical Process Control (SPC) practice, learning, and research.
These tools are in the early stages of development and can be easily misused or misunderstood.
We explore ChatGPT's ability to provide code, explain basic concepts, and create knowledge related to SPC practice, learning, and research.
arXiv Detail & Related papers (2023-02-17T15:48:37Z) - A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on
Reasoning, Hallucination, and Interactivity [79.12003701981092]
We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks.
We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset.
ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning.
arXiv Detail & Related papers (2023-02-08T12:35:34Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z) - Explaining Patterns in Data with Language Models via Interpretable
Autoprompting [143.4162028260874]
We introduce interpretable autoprompting (iPrompt), an algorithm that generates a natural-language string explaining the data.
iPrompt can yield meaningful insights by accurately finding groundtruth dataset descriptions.
Experiments with an fMRI dataset show the potential for iPrompt to aid in scientific discovery.
arXiv Detail & Related papers (2022-10-04T18:32:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.