Related papers: SODBench: A Large Language Model Approach to Documenting Spreadsheet Operations

SODBench: A Large Language Model Approach to Documenting Spreadsheet Operations

URL: http://arxiv.org/abs/2510.19864v1
Date: Wed, 22 Oct 2025 01:36:13 GMT
Title: SODBench: A Large Language Model Approach to Documenting Spreadsheet Operations
Authors: Amila Indika, Igor Molybog,
Abstract summary: This paper introduces Spreadsheet Operations Documentation (SOD), an AI task that involves generating human-readable explanations from spreadsheet operations.<n>We present a benchmark of 111 spreadsheet manipulation code snippets, each paired with a corresponding natural language summary.<n>Our findings suggest that LLMs can generate accurate spreadsheet documentation, making SOD a feasible prerequisite step toward enhancing, maintainability, collaborative in spreadsheets.
Score: 1.3669571918482655
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Numerous knowledge workers utilize spreadsheets in business, accounting, and finance. However, a lack of systematic documentation methods for spreadsheets hinders automation, collaboration, and knowledge transfer, which risks the loss of crucial institutional knowledge. This paper introduces Spreadsheet Operations Documentation (SOD), an AI task that involves generating human-readable explanations from spreadsheet operations. Many previous studies have utilized Large Language Models (LLMs) for generating spreadsheet manipulation code; however, translating that code into natural language for SOD is a less-explored area. To address this, we present a benchmark of 111 spreadsheet manipulation code snippets, each paired with a corresponding natural language summary. We evaluate five LLMs, GPT-4o, GPT-4o-mini, LLaMA-3.3-70B, Mixtral-8x7B, and Gemma2-9B, using BLEU, GLEU, ROUGE-L, and METEOR metrics. Our findings suggest that LLMs can generate accurate spreadsheet documentation, making SOD a feasible prerequisite step toward enhancing reproducibility, maintainability, and collaborative workflows in spreadsheets, although there are challenges that need to be addressed.

Related papers

Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction [28.47810405584841]
Arranged and Organized Extraction Benchmark designed to evaluate ability of large language models to comprehend fragmented documents.<n>AOE includes 11 carefully crafted tasks across three diverse domains, requiring models to generate context-specific schema tailored to varied input queries.<n>Results show that even the most advanced models struggled significantly.
arXiv Detail & Related papers (2025-07-22T06:37:51Z)
Large Language Models for Spreadsheets: Benchmarking Progress and Evaluating Performance with FLARE [0.0]
Large Language Models (LLMs) have demonstrated some significant capabilities across various domains.<n>This study introduces a benchmark framework to evaluate the performance of leading LLMs in executing spreadsheet functions.
arXiv Detail & Related papers (2025-06-19T03:47:38Z)
Extract Information from Hybrid Long Documents Leveraging LLMs: A Framework and Dataset [52.286323454512996]
Large Language Models (LLMs) can comprehend and analyze hybrid text, containing textual and tabular data.<n>We propose an Automated Information Extraction framework (AIE) to enable LLMs to process the hybrid long documents (HLDs) and carry out experiments to analyse four important aspects of information extraction from HLDs.<n>To address the issue of dataset scarcity in HLDs and support future work, we also propose the Financial Reports Numerical Extraction (FINE) dataset.
arXiv Detail & Related papers (2024-12-28T07:54:14Z)
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models [44.08092362611575]
We introduce SpreadsheetLLM, an efficient encoding method for large language models (LLMs) on spreadsheets.<n>We develop SheetCompressor, an innovative encoding framework that compresses spreadsheets effectively for LLMs.<n>Fine-tuned LLM with SheetCompressor has an average compression ratio of 25 times, and achieves a state-of-the-art 78.9% F1 score, surpassing the best existing models by 12.3%.
arXiv Detail & Related papers (2024-07-12T06:34:21Z)
TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios [51.66718740300016]
TableLLM is a robust large language model (LLM) with 8 billion parameters.<n>TableLLM is purpose-built for proficiently handling data manipulation tasks.<n>We have released the model checkpoint, source code, benchmarks, and a web application for user interaction.
arXiv Detail & Related papers (2024-03-28T11:21:12Z)
SheetAgent: Towards A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models [45.930510174309845]
Large language model (LLM) has been recently attempted for automatic spreadsheet manipulation.<n>SheetAgent consists of three collaborative modules: Planner, Informer, and Retriever.<n>Extensive experiments demonstrate that SheetAgent delivers 20--40% pass rate improvements on multiple benchmarks over baselines.
arXiv Detail & Related papers (2024-03-06T11:48:08Z)
TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data [73.29220562541204]
We consider harnessing the amazing power of language models (LLMs) to solve our task. We develop a TAT-LLM language model by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets.
arXiv Detail & Related papers (2024-01-24T04:28:50Z)
SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models [60.171444066848856]
We propose a SheetCopilot agent that takes natural language task and control spreadsheet to fulfill the requirements. We curate a representative dataset containing 221 spreadsheet control tasks and establish a fully automated evaluation pipeline. Our SheetCopilot correctly completes 44.3% of tasks for a single generation, outperforming the strong code generation baseline by a wide margin.
arXiv Detail & Related papers (2023-05-30T17:59:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.