Related papers: SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models

SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models

URL: http://arxiv.org/abs/2305.19308v2
Date: Mon, 30 Oct 2023 06:36:24 GMT
Title: SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models
Authors: Hongxin Li, Jingran Su, Yuntao Chen, Qing Li, Zhaoxiang Zhang
Abstract summary: We propose a SheetCopilot agent that takes natural language task and control spreadsheet to fulfill the requirements. We curate a representative dataset containing 221 spreadsheet control tasks and establish a fully automated evaluation pipeline. Our SheetCopilot correctly completes 44.3% of tasks for a single generation, outperforming the strong code generation baseline by a wide margin.
Score: 60.171444066848856
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Computer end users have spent billions of hours completing daily tasks like tabular data processing and project timeline scheduling. Most of these tasks are repetitive and error-prone, yet most end users lack the skill to automate these burdensome works. With the advent of large language models (LLMs), directing software with natural language user requests become a reachable goal. In this work, we propose a SheetCopilot agent that takes natural language task and control spreadsheet to fulfill the requirements. We propose a set of atomic actions as an abstraction of spreadsheet software functionalities. We further design a state machine-based task planning framework for LLMs to robustly interact with spreadsheets. We curate a representative dataset containing 221 spreadsheet control tasks and establish a fully automated evaluation pipeline for rigorously benchmarking the ability of LLMs in software control tasks. Our SheetCopilot correctly completes 44.3\% of tasks for a single generation, outperforming the strong code generation baseline by a wide margin. Our project page:https://sheetcopilot.github.io/.

Related papers

Plan-over-Graph: Towards Parallelable LLM Agent Schedule [53.834646147919436]
Large Language Models (LLMs) have demonstrated exceptional abilities in reasoning for task planning. This paper introduces a novel paradigm, plan-over-graph, in which the model first decomposes a real-life textual task into executable subtasks and constructs an abstract task graph. The model then understands this task graph as input and generates a plan for parallel execution.
arXiv Detail & Related papers (2025-02-20T13:47:51Z)
TableTalk: Scaffolding Spreadsheet Development with a Language Agent [20.560984872689414]
TableTalk is a language agent that helps programmers build spreadsheets conversationally. Its design reifies three design principles -- scaffolding, flexibility, and incrementality. A user study with 20 programmers shows that TableTalk produces spreadsheets 2.3 times more likely to be preferred.
arXiv Detail & Related papers (2025-02-13T21:43:51Z)
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering. Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications. These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z)
Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks [50.27313829438866]
Plan-Seq-Learn (PSL) is a modular approach that uses motion planning to bridge the gap between abstract language and learned low-level control. PSL achieves success rates of over 85%, out-performing language-based, classical, and end-to-end approaches.
arXiv Detail & Related papers (2024-05-02T17:59:31Z)
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks [31.031053149807857]
We introduce m&m's: a benchmark containing 4K+ multi-step multi-modal tasks involving 33 tools. For each of these task queries, we provide automatically generated plans using this realistic toolset. We provide a high-quality subset of 1,565 task plans that are human-verified and correctly.
arXiv Detail & Related papers (2024-03-17T04:36:18Z)
SheetAgent: Towards A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models [40.631127096231886]
Large language model (LLM) has been recently attempted for automatic spreadsheet manipulation but has not yet been investigated in realistic tasks. We introduce $textbfSheetRM$, a benchmark featuring long-horizon and multi-category tasks with reasoning-dependent manipulation. We further propose $textbfSheetAgent$, a novel autonomous agent that utilizes the power of LLMs.
arXiv Detail & Related papers (2024-03-06T11:48:08Z)
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web [43.60736044871539]
We introduce OmniACT, the first-of-a-kind dataset and benchmark for assessing an agent's capability to generate programs. The dataset consists of fundamental tasks such as "Play the next song", as well as longer horizon tasks such as "Send an email to John Doe mentioning the time and place to meet" Our benchmark provides a platform to measure and evaluate the progress of language model agents in automating computer tasks.
arXiv Detail & Related papers (2024-02-27T14:47:53Z)
Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks. We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z)
TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation. Specifically, task decomposition, tool selection, and parameter prediction are assessed. Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z)
AutoScrum: Automating Project Planning Using Large Language Models [0.0]
Large language models have made it possible to use language models for advanced reasoning. In this paper we leverage this ability for designing complex project plans based only on knowing the current state and the desired state. Two approaches are demonstrated - a scrum based approach and a shortcut plan approach.
arXiv Detail & Related papers (2023-06-05T19:16:37Z)
Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents [99.17668730578586]
Pre-trained large language models (LLMs) capture procedural knowledge about the world. Plan, Eliminate, and Track (PET) framework translates a task description into a list of high-level sub-tasks. PET framework leads to a significant 15% improvement over SOTA for generalization to human goal specifications.
arXiv Detail & Related papers (2023-05-03T20:11:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.