Related papers: SheetAgent: Towards A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models

SheetAgent: Towards A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models

URL: http://arxiv.org/abs/2403.03636v2
Date: Sat, 24 Aug 2024 17:03:11 GMT
Title: SheetAgent: Towards A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models
Authors: Yibin Chen, Yifu Yuan, Zeyu Zhang, Yan Zheng, Jinyi Liu, Fei Ni, Jianye Hao,
Abstract summary: Large language model (LLM) has been recently attempted for automatic spreadsheet manipulation but has not yet been investigated in realistic tasks. We introduce $textbfSheetRM$, a benchmark featuring long-horizon and multi-category tasks with reasoning-dependent manipulation. We further propose $textbfSheetAgent$, a novel autonomous agent that utilizes the power of LLMs.
Score: 40.631127096231886
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spreadsheet manipulation is widely existing in most daily works and significantly improves working efficiency. Large language model (LLM) has been recently attempted for automatic spreadsheet manipulation but has not yet been investigated in complicated and realistic tasks where reasoning challenges exist (e.g., long horizon manipulation with multi-step reasoning and ambiguous requirements). To bridge the gap with the real-world requirements, we introduce $\textbf{SheetRM}$, a benchmark featuring long-horizon and multi-category tasks with reasoning-dependent manipulation caused by real-life challenges. To mitigate the above challenges, we further propose $\textbf{SheetAgent}$, a novel autonomous agent that utilizes the power of LLMs. SheetAgent consists of three collaborative modules: $\textit{Planner}$, $\textit{Informer}$, and $\textit{Retriever}$, achieving both advanced reasoning and accurate manipulation over spreadsheets without human interaction through iterative task reasoning and reflection. Extensive experiments demonstrate that SheetAgent delivers 20-30% pass rate improvements on multiple benchmarks over baselines, achieving enhanced precision in spreadsheet manipulation and demonstrating superior table reasoning abilities. More details and visualizations are available at https://sheetagent.github.io.

Related papers

Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling [83.78874399606379]
We propose MACT, a Multi-Agent Collaboration framework with Test-Time scaling.<n>It comprises four distinct small-scale agents, with clearly defined roles and effective collaboration.<n>It shows superior performance with a smaller parameter scale without sacrificing the ability of general and mathematical tasks.
arXiv Detail & Related papers (2025-08-05T12:52:09Z)
SheetMind: An End-to-End LLM-Powered Multi-Agent Framework for Spreadsheet Automation [6.369724723888092]
SheetMind is a framework for spreadsheet automation via natural language instructions.<n>It supports real-time interaction without requiring scripting or formula knowledge.<n>Our results highlight the effectiveness of multi agent decomposition and grammar based execution.
arXiv Detail & Related papers (2025-06-14T04:22:15Z)
DatawiseAgent: A Notebook-Centric LLM Agent Framework for Automated Data Science [4.1431677219677185]
DatawiseAgent is a notebook-centric agent framework that unifies interactions among user, agent and the computational environment. It orchestrates four stages, including DSF-like planning, incremental execution, self-ging, and post-filtering. It consistently outperforms or matches state-of-the-art methods across multiple model settings.
arXiv Detail & Related papers (2025-03-10T08:32:33Z)
nvAgent: Automated Data Visualization from Natural Language via Collaborative Agent Workflow [9.676697360425196]
Natural Language to Visualization (NL2Vis) seeks to convert natural-language descriptions into visual representations of given tables. We propose a collaborative agent workflow, termed nvAgent, for NL2Vis. Comprehensive evaluations on the new VisEval benchmark demonstrate that nvAgent consistently surpasses state-of-the-art baselines.
arXiv Detail & Related papers (2025-02-07T16:03:08Z)
PlotGen: Multi-Agent LLM-based Scientific Data Visualization via Multimodal Feedback [47.79080056618323]
We propose PlotGen, a novel multi-agent framework aimed at the creation of precise scientific visualizations. PlotGen orchestrates multiple. retrieval agents, including a Query Planning Agent that breaks. down complex user requests into executable code, and three. retrieval feedback agents. Experiments show that PlotGen outperforms strong baselines, achieving a 4-6 percent improvement on the MatBench dataset.
arXiv Detail & Related papers (2025-02-03T02:00:29Z)
MALT: Improving Reasoning with Multi-Agent LLM Training [66.9481561915524]
MALT (Multi-Agent LLM Training) is a novel post-training strategy that divides the reasoning process into generation, verification, and refinement steps. On MATH, GSM8K, and CSQA, MALT surpasses the same baseline LLM with a relative improvement of 15.66%, 7.42%, and 9.40% respectively.
arXiv Detail & Related papers (2024-12-02T19:30:36Z)
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents [52.13695464678006]
This study enhances an LLM-based web agent by simply refining its observation and action space. AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively.
arXiv Detail & Related papers (2024-10-17T17:50:38Z)
TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools [51.576974932743596]
Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts. TACT contains challenging instructions that demand stitching information scattered across one or more texts. We construct this dataset by leveraging an existing dataset of texts and their associated tables. We demonstrate that all contemporary LLMs perform poorly on this dataset, achieving an accuracy below 38%.
arXiv Detail & Related papers (2024-06-05T20:32:56Z)
Meta-Task Planning for Language Agents [13.550774629515843]
Large language model-based agents (LLM agents) have emerged as a promising paradigm for achieving artificial general intelligence (AGI) This paper introduces Meta-Task Planning (MTP), a zero-shot methodology for collaborative LLM-based multi-agent systems. MTP achieved an average $sim40%$ success rate on TravelPlanner, significantly higher than the state-of-the-art (SOTA) baseline.
arXiv Detail & Related papers (2024-05-26T10:33:17Z)
TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios [52.73289223176475]
TableLLM is a robust large language model (LLM) with 13 billion parameters. TableLLM is purpose-built for proficiently handling data manipulation tasks. We have released the model checkpoint, source code, benchmarks, and a web application for user interaction.
arXiv Detail & Related papers (2024-03-28T11:21:12Z)
HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference [68.59839755875252]
HiRE comprises of two novel components: (i) a compression scheme to cheaply predict top-$k$ rows/columns with high recall, followed by full computation restricted to the predicted subset, and (ii) DA-TOP-$k$: an efficient multi-device approximate top-$k$ operator. We demonstrate that on a one billion parameter model, HiRE applied to both the softmax as well as feedforward layers, achieves almost matching pretraining and downstream accuracy, and speeds up inference latency by $1.47times$ on a single TPUv5e device.
arXiv Detail & Related papers (2024-02-14T18:04:36Z)
Towards Robust Multi-Modal Reasoning via Model Selection [7.6621866737827045]
LLM serves as the "brain" of the agent, orchestrating multiple tools for collaborative multi-step task solving. We propose the $textitM3$ framework as a plug-in with negligible runtime overhead at test-time. Our experiments reveal that our framework enables dynamic model selection, considering both user inputs and subtask dependencies.
arXiv Detail & Related papers (2023-10-12T16:06:18Z)
Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation [45.90925587972781]
Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions. Due to high computational costs, the current trend is to use prompt instruction tuning to better adjust monolithic, pretrained LLMs for new -- but often individual -- downstream tasks. MoPs can simultaneously mitigate prompt training "interference" in multi-task, multi-source scenarios.
arXiv Detail & Related papers (2023-10-04T14:11:12Z)
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration [55.35849138235116]
We propose automatically selecting a team of agents from candidates to collaborate in a dynamic communication structure toward different tasks and domains. Specifically, we build a framework named Dynamic LLM-Powered Agent Network ($textDyLAN$) for LLM-powered agent collaboration. We demonstrate that DyLAN outperforms strong baselines in code generation, decision-making, general reasoning, and arithmetic reasoning tasks with moderate computational cost.
arXiv Detail & Related papers (2023-10-03T16:05:48Z)
SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models [60.171444066848856]
We propose a SheetCopilot agent that takes natural language task and control spreadsheet to fulfill the requirements. We curate a representative dataset containing 221 spreadsheet control tasks and establish a fully automated evaluation pipeline. Our SheetCopilot correctly completes 44.3% of tasks for a single generation, outperforming the strong code generation baseline by a wide margin.
arXiv Detail & Related papers (2023-05-30T17:59:30Z)
Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis. For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z)
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents [26.78244595330595]
"$underlineD$escribe" is an interactive planning approach based on Large Language Models (LLMs) DEPS facilitates better error correction on initial LLM-generated $textitplan$ by integrating $textitdescription$ of the plan execution process. Experiments mark the milestone of the first zero-shot multi-task agent that can robustly accomplish 70+ Minecraft tasks.
arXiv Detail & Related papers (2023-02-03T06:06:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.