Related papers: SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation

SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation

URL: http://arxiv.org/abs/2406.14991v2
Date: Thu, 17 Oct 2024 07:23:23 GMT
Title: SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation
Authors: Zeyao Ma, Bohan Zhang, Jing Zhang, Jifan Yu, Xiaokang Zhang, Xiaohan Zhang, Sijia Luo, Xi Wang, Jie Tang,
Abstract summary: SpreadsheetBench is designed to immerse current large language models (LLMs) in the actual workflow of spreadsheet users. Unlike existing benchmarks that rely on synthesized queries and simplified spreadsheet files, SpreadsheetBench is built from 912 real questions gathered from online Excel forums. Our comprehensive evaluation of various LLMs under both single-round and multi-round inference settings reveals a substantial gap between the state-of-the-art (SOTA) models and human performance.
Score: 34.8332394229927
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We introduce SpreadsheetBench, a challenging spreadsheet manipulation benchmark exclusively derived from real-world scenarios, designed to immerse current large language models (LLMs) in the actual workflow of spreadsheet users. Unlike existing benchmarks that rely on synthesized queries and simplified spreadsheet files, SpreadsheetBench is built from 912 real questions gathered from online Excel forums, which reflect the intricate needs of users. The associated spreadsheets from the forums contain a variety of tabular data such as multiple tables, non-standard relational tables, and abundant non-textual elements. Furthermore, we propose a more reliable evaluation metric akin to online judge platforms, where multiple spreadsheet files are created as test cases for each instruction, ensuring the evaluation of robust solutions capable of handling spreadsheets with varying values. Our comprehensive evaluation of various LLMs under both single-round and multi-round inference settings reveals a substantial gap between the state-of-the-art (SOTA) models and human performance, highlighting the benchmark's difficulty.

Related papers

Large Language Models for Spreadsheets: Benchmarking Progress and Evaluating Performance with FLARE [0.0]
Large Language Models (LLMs) have demonstrated some significant capabilities across various domains.<n>This study introduces a benchmark framework to evaluate the performance of leading LLMs in executing spreadsheet functions.
arXiv Detail & Related papers (2025-06-19T03:47:38Z)
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models [44.08092362611575]
SpreadsheetLLM is an efficient encoding method designed to unleash and optimize large language models (LLMs) on spreadsheets. We develop SheetCompressor, an innovative encoding framework that compresses spreadsheets effectively for LLMs. Fine-tuned LLM with SheetCompressor has an average compression ratio of 25 times, but achieves a state-of-the-art 78.9% F1 score, surpassing the best existing models by 12.3%.
arXiv Detail & Related papers (2024-07-12T06:34:21Z)
QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs [63.98556480088152]
Table summarization is a crucial task aimed at condensing information into concise and comprehensible textual summaries. We propose a novel method to address these limitations by introducing query-focused multi-table summarization. Our approach, which comprises a table serialization module, a summarization controller, and a large language model, generates query-dependent table summaries tailored to users' information needs.
arXiv Detail & Related papers (2024-05-08T15:05:55Z)
Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations [36.2969566996675]
We develop an Auto-Formula system that can accurately predict formulas that users want to author in a target spreadsheet cell. We use contrastive-learning techniques inspired by "similar-face recognition" from compute vision.
arXiv Detail & Related papers (2024-04-19T03:28:18Z)
TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios [52.73289223176475]
TableLLM is a robust large language model (LLM) with 13 billion parameters. TableLLM is purpose-built for proficiently handling data manipulation tasks. We have released the model checkpoint, source code, benchmarks, and a web application for user interaction.
arXiv Detail & Related papers (2024-03-28T11:21:12Z)
SheetAgent: Towards A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models [45.930510174309845]
Large language model (LLM) has been recently attempted for automatic spreadsheet manipulation. SheetAgent consists of three collaborative modules: Planner, Informer, and Retriever. Extensive experiments demonstrate that SheetAgent delivers 20--40% pass rate improvements on multiple benchmarks over baselines.
arXiv Detail & Related papers (2024-03-06T11:48:08Z)
TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively. It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z)
SpreadsheetCoder: Formula Prediction from Semi-structured Context [70.41579328458116]
We propose a BERT-based model architecture to represent the tabular context in both row-based and column-based formats. We train our model on a large dataset of spreadsheets, and demonstrate that SpreadsheetCoder achieves top-1 prediction accuracy of 42.51%. Compared to the rule-based system, SpreadsheetCoder 82% assists more users in composing formulas on Google Sheets.
arXiv Detail & Related papers (2021-06-26T11:26:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.