SheetAgent: Towards A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models
- URL: http://arxiv.org/abs/2403.03636v2
- Date: Sat, 24 Aug 2024 17:03:11 GMT
- Title: SheetAgent: Towards A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models
- Authors: Yibin Chen, Yifu Yuan, Zeyu Zhang, Yan Zheng, Jinyi Liu, Fei Ni, Jianye Hao,
- Abstract summary: Large language model (LLM) has been recently attempted for automatic spreadsheet manipulation but has not yet been investigated in realistic tasks.
We introduce $textbfSheetRM$, a benchmark featuring long-horizon and multi-category tasks with reasoning-dependent manipulation.
We further propose $textbfSheetAgent$, a novel autonomous agent that utilizes the power of LLMs.
- Score: 40.631127096231886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spreadsheet manipulation is widely existing in most daily works and significantly improves working efficiency. Large language model (LLM) has been recently attempted for automatic spreadsheet manipulation but has not yet been investigated in complicated and realistic tasks where reasoning challenges exist (e.g., long horizon manipulation with multi-step reasoning and ambiguous requirements). To bridge the gap with the real-world requirements, we introduce $\textbf{SheetRM}$, a benchmark featuring long-horizon and multi-category tasks with reasoning-dependent manipulation caused by real-life challenges. To mitigate the above challenges, we further propose $\textbf{SheetAgent}$, a novel autonomous agent that utilizes the power of LLMs. SheetAgent consists of three collaborative modules: $\textit{Planner}$, $\textit{Informer}$, and $\textit{Retriever}$, achieving both advanced reasoning and accurate manipulation over spreadsheets without human interaction through iterative task reasoning and reflection. Extensive experiments demonstrate that SheetAgent delivers 20-30% pass rate improvements on multiple benchmarks over baselines, achieving enhanced precision in spreadsheet manipulation and demonstrating superior table reasoning abilities. More details and visualizations are available at https://sheetagent.github.io.
Related papers
- Great Memory, Shallow Reasoning: Limits of $k$NN-LMs [71.73611113995143]
$k$NN-LMs, which integrate retrieval with next-word prediction, have demonstrated strong performance in language modeling.
We ask whether this improved ability to recall information really translates into downstream abilities.
arXiv Detail & Related papers (2024-08-21T17:59:05Z) - TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools [51.576974932743596]
Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts.
TACT contains challenging instructions that demand stitching information scattered across one or more texts.
We construct this dataset by leveraging an existing dataset of texts and their associated tables.
We demonstrate that all contemporary LLMs perform poorly on this dataset, achieving an accuracy below 38%.
arXiv Detail & Related papers (2024-06-05T20:32:56Z) - Meta-Task Planning for Language Agents [13.550774629515843]
Large language model-based agents (LLM agents) have emerged as a promising paradigm for achieving artificial general intelligence (AGI)
This paper introduces Meta-Task Planning (MTP), a zero-shot methodology for collaborative LLM-based multi-agent systems.
MTP achieved an average $sim40%$ success rate on TravelPlanner, significantly higher than the state-of-the-art (SOTA) baseline.
arXiv Detail & Related papers (2024-05-26T10:33:17Z) - TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios [52.73289223176475]
TableLLM is a robust large language model (LLM) with 13 billion parameters.
TableLLM is purpose-built for proficiently handling data manipulation tasks.
We have released the model checkpoint, source code, benchmarks, and a web application for user interaction.
arXiv Detail & Related papers (2024-03-28T11:21:12Z) - HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM
Inference [68.59839755875252]
HiRE comprises of two novel components: (i) a compression scheme to cheaply predict top-$k$ rows/columns with high recall, followed by full computation restricted to the predicted subset, and (ii) DA-TOP-$k$: an efficient multi-device approximate top-$k$ operator.
We demonstrate that on a one billion parameter model, HiRE applied to both the softmax as well as feedforward layers, achieves almost matching pretraining and downstream accuracy, and speeds up inference latency by $1.47times$ on a single TPUv5e device.
arXiv Detail & Related papers (2024-02-14T18:04:36Z) - Towards Robust Multi-Modal Reasoning via Model Selection [7.6621866737827045]
LLM serves as the "brain" of the agent, orchestrating multiple tools for collaborative multi-step task solving.
We propose the $textitM3$ framework as a plug-in with negligible runtime overhead at test-time.
Our experiments reveal that our framework enables dynamic model selection, considering both user inputs and subtask dependencies.
arXiv Detail & Related papers (2023-10-12T16:06:18Z) - Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task
Adaptation [45.90925587972781]
Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions.
Due to high computational costs, the current trend is to use prompt instruction tuning to better adjust monolithic, pretrained LLMs for new -- but often individual -- downstream tasks.
MoPs can simultaneously mitigate prompt training "interference" in multi-task, multi-source scenarios.
arXiv Detail & Related papers (2023-10-04T14:11:12Z) - SheetCopilot: Bringing Software Productivity to the Next Level through
Large Language Models [60.171444066848856]
We propose a SheetCopilot agent that takes natural language task and control spreadsheet to fulfill the requirements.
We curate a representative dataset containing 221 spreadsheet control tasks and establish a fully automated evaluation pipeline.
Our SheetCopilot correctly completes 44.3% of tasks for a single generation, outperforming the strong code generation baseline by a wide margin.
arXiv Detail & Related papers (2023-05-30T17:59:30Z) - Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents [26.78244595330595]
"$underlineD$escribe" is an interactive planning approach based on Large Language Models (LLMs)
DEPS facilitates better error correction on initial LLM-generated $textitplan$ by integrating $textitdescription$ of the plan execution process.
Experiments mark the milestone of the first zero-shot multi-task agent that can robustly accomplish 70+ Minecraft tasks.
arXiv Detail & Related papers (2023-02-03T06:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.