Related papers: InstructExcel: A Benchmark for Natural Language Instruction in Excel

InstructExcel: A Benchmark for Natural Language Instruction in Excel

URL: http://arxiv.org/abs/2310.14495v1
Date: Mon, 23 Oct 2023 02:00:55 GMT
Title: InstructExcel: A Benchmark for Natural Language Instruction in Excel
Authors: Justin Payan, Swaroop Mishra, Mukul Singh, Carina Negreanu, Christian Poelitz, Chitta Baral, Subhro Roy, Rasika Chakravarthy, Benjamin Van Durme, and Elnaz Nouri
Abstract summary: This work investigates whether Large Language Models can generate code that solves Excel specific tasks provided via natural language user instructions. Our benchmark includes over 10k samples covering 170+ Excel operations across 2,000 publicly available Excel spreadsheets. We observe that (1) using GPT-4 over GPT-3.5, (2) providing more in-context examples, and (3) dynamic prompting can help improve performance on this benchmark.
Score: 72.018640505825
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the evolution of Large Language Models (LLMs) we can solve increasingly more complex NLP tasks across various domains, including spreadsheets. This work investigates whether LLMs can generate code (Excel OfficeScripts, a TypeScript API for executing many tasks in Excel) that solves Excel specific tasks provided via natural language user instructions. To do so we introduce a new large-scale benchmark, InstructExcel, created by leveraging the 'Automate' feature in Excel to automatically generate OfficeScripts from users' actions. Our benchmark includes over 10k samples covering 170+ Excel operations across 2,000 publicly available Excel spreadsheets. Experiments across various zero-shot and few-shot settings show that InstructExcel is a hard benchmark for state of the art models like GPT-4. We observe that (1) using GPT-4 over GPT-3.5, (2) providing more in-context examples, and (3) dynamic prompting can help improve performance on this benchmark.

Related papers

Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Generator-Validator Fine-tuning [52.08794743921141]
We propose Table-LLM-Specialist, or Table-Specialist for short, as a new self-trained fine-tuning paradigm specifically designed for table tasks.
arXiv Detail & Related papers (2024-10-16T02:04:17Z)
Excel: Automated Ledger or Analytics IDE? [0.0]
Spreadsheets have undergone a gradual transformation, evolving from simple ledger automation tools to the current state of Excel. Excel includes a fully functional database, an OLAP Engine, multiple statistical programming languages, multiple third-party software libraries, dynamic charts, and real time data connectors. The importance of establishing a comprehensive risk framework for managing this distinctive development environment becomes clear.
arXiv Detail & Related papers (2024-09-03T01:12:52Z)
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering. Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications. These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z)
TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios [52.73289223176475]
TableLLM is a robust large language model (LLM) with 13 billion parameters. TableLLM is purpose-built for proficiently handling data manipulation tasks. We have released the model checkpoint, source code, benchmarks, and a web application for user interaction.
arXiv Detail & Related papers (2024-03-28T11:21:12Z)
NL2Formula: Generating Spreadsheet Formulas from Natural Language Queries [29.33149993368329]
This paper introduces a novel benchmark task called NL2Formula. The aim is to generate executable formulas that are grounded on a spreadsheet table, given a Natural Language (NL) query as input. We construct a comprehensive dataset consisting of 70,799 paired NL queries and corresponding spreadsheet formulas, covering 21,670 tables and 37 types of formula functions.
arXiv Detail & Related papers (2024-02-20T05:58:05Z)
Reducing Errors in Excel Models with Component-Based Software Engineering [0.0]
LAMBDA is an Excel function that creates functions from Excel's formulas. LAMBDA functions can be reused in any project just like any Excel function.
arXiv Detail & Related papers (2023-08-31T20:28:48Z)
ChatGPT and Excel -- trust, but verify [0.0]
This paper adopts a critical approach to ChatGPT, showing how its huge reach makes it a useful tool for people with simple requirements but a bad, even misleading guide to those with more complex problems which are more rarely present in the training data and even more rarely have straightforward solutions. It concludes with a practical guide for how to add an Excelscript button, with system and user prompts, to the ChatGPT API into the Excel desktop environment, supported by a blog post giving the technical details for those interested.
arXiv Detail & Related papers (2023-08-31T20:21:02Z)
SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models [60.171444066848856]
We propose a SheetCopilot agent that takes natural language task and control spreadsheet to fulfill the requirements. We curate a representative dataset containing 221 spreadsheet control tasks and establish a fully automated evaluation pipeline. Our SheetCopilot correctly completes 44.3% of tasks for a single generation, outperforming the strong code generation baseline by a wide margin.
arXiv Detail & Related papers (2023-05-30T17:59:30Z)
Toolformer: Language Models Can Teach Themselves to Use Tools [62.04867424598204]
Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. We show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction.
arXiv Detail & Related papers (2023-02-09T16:49:57Z)
FLAME: A small language model for spreadsheet formulas [25.667479554632735]
We present FLAME, a transformer-based model trained exclusively on Excel formulas. We use sketch deduplication, introduce an Excel-specific formula tokenizer, and use domain-specific versions of masked span prediction. We evaluate FLAME on formula repair, formula completion, and similarity-based formula retrieval.
arXiv Detail & Related papers (2023-01-31T17:29:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.