Related papers: From Rows to Reasoning: A Retrieval-Augmented Multimodal Framework for Spreadsheet Understanding

From Rows to Reasoning: A Retrieval-Augmented Multimodal Framework for Spreadsheet Understanding

URL: http://arxiv.org/abs/2601.08741v1
Date: Tue, 13 Jan 2026 17:18:14 GMT
Title: From Rows to Reasoning: A Retrieval-Augmented Multimodal Framework for Spreadsheet Understanding
Authors: Anmol Gulati, Sahil Sen, Waqar Sarguroh, Kevin Paul,
Abstract summary: Large Language Models (LLMs) struggle to reason over large-scale enterprise spreadsheets containing thousands of numeric rows, multiple linked sheets, and embedded visual content such as charts and receipts.<n>We present From Rows to Reasoning (FRTR), an advanced, multimodal retrieval-augmented generation framework that decomposes Excel workbooks into granular row, column, and block embeddings.<n>We tested FRTR on six LLMs, achieving 74% answer accuracy on FRTR-Bench with Claude Sonnet 4.5, a substantial improvement over prior state-of-the-art approaches that reached only 24%.
Score: 0.7723674433972977
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) struggle to reason over large-scale enterprise spreadsheets containing thousands of numeric rows, multiple linked sheets, and embedded visual content such as charts and receipts. Prior state-of-the-art spreadsheet reasoning approaches typically rely on single-sheet compression or full-context encoding, which limits scalability and fails to reflect how real users interact with complex, multimodal workbooks. We introduce FRTR-Bench, the first large-scale benchmark for multimodal spreadsheet reasoning, comprising 30 enterprise-grade Excel workbooks spanning nearly four million cells and more than 50 embedded images. To address these challenges, we present From Rows to Reasoning (FRTR), an advanced, multimodal retrieval-augmented generation framework that decomposes Excel workbooks into granular row, column, and block embeddings, employs hybrid lexical-dense retrieval with Reciprocal Rank Fusion (RRF), and integrates multimodal embeddings to reason over both numerical and visual information. We tested FRTR on six LLMs, achieving 74% answer accuracy on FRTR-Bench with Claude Sonnet 4.5, a substantial improvement over prior state-of-the-art approaches that reached only 24%. On the SpreadsheetLLM benchmark, FRTR achieved 87% accuracy with GPT-5 while reducing token usage by roughly 50% compared to context-compression methods.

Related papers

Efficient Table Retrieval and Understanding with Multimodal Large Language Models [22.49099892041409]
Tabular data is frequently captured in image form across a wide range of real-world scenarios such as financial reports, handwritten records, and document scans.<n>These visual representations pose unique challenges for machine understanding, as they combine both structural and visual complexities.<n>We propose TabRAG, a framework that enables MLLMs to answer queries over large collections of table images.
arXiv Detail & Related papers (2026-02-07T17:50:33Z)
CORE-T: COherent REtrieval of Tables for Text-to-SQL [91.76918495375384]
CORE-T is a scalable, training-free framework that enriches tables with purpose metadata and pre-computes a lightweight table-compatibility cache.<n>Across Bird, Spider, and MMQA, CORE-T improves table-selection F1 by up to 22.7 points while retrieving up to 42% fewer tables.
arXiv Detail & Related papers (2026-01-19T14:51:23Z)
SODBench: A Large Language Model Approach to Documenting Spreadsheet Operations [1.3669571918482655]
This paper introduces Spreadsheet Operations Documentation (SOD), an AI task that involves generating human-readable explanations from spreadsheet operations.<n>We present a benchmark of 111 spreadsheet manipulation code snippets, each paired with a corresponding natural language summary.<n>Our findings suggest that LLMs can generate accurate spreadsheet documentation, making SOD a feasible prerequisite step toward enhancing, maintainability, collaborative in spreadsheets.
arXiv Detail & Related papers (2025-10-22T01:36:13Z)
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding [61.36285696607487]
Document understanding is critical for applications from financial analysis to scientific discovery.<n>Current approaches, whether OCR-based pipelines feeding Large Language Models (LLMs) or native Multimodal LLMs (MLLMs) face key limitations.<n>Retrieval-Augmented Generation (RAG) helps ground models in external data, but documents' multimodal nature, combining text, tables, charts, and layout, demands a more advanced paradigm: Multimodal RAG.
arXiv Detail & Related papers (2025-10-17T02:33:16Z)
UNIDOC-BENCH: A Unified Benchmark for Document-Centric Multimodal RAG [82.84014669683863]
Multimodal retrieval-augmented generation (MM-RAG) is a key approach for applying large language models to real-world knowledge bases.<n>UniDoc-Bench is the first large-scale, realistic benchmark for MM-RAG built from 70k real-world PDF pages.<n>Our experiments show that multimodal text-image fusion RAG systems consistently outperform both unimodal and jointly multimodal embedding-based retrieval.
arXiv Detail & Related papers (2025-10-04T04:30:13Z)
SheetDesigner: MLLM-Powered Spreadsheet Layout Generation with Rule-Based and Vision-Based Reflection [26.315814679351988]
SheetDesigner is a zero-shot framework that combines rule and vision reflection for component placement and content population.<n>We find that through vision modality, MLLMs handle overlap and balance well but struggle with alignment.
arXiv Detail & Related papers (2025-09-09T07:51:38Z)
RAG over Tables: Hierarchical Memory Index, Multi-Stage Retrieval, and Benchmarking [63.253294691180635]
In real-world scenarios, beyond pure text, a substantial amount of knowledge is stored in tables.<n>We first propose a table-corpora-aware RAG framework, named T-RAG, which consists of the hierarchical memory index, multi-stage retrieval, and graph-aware prompting.
arXiv Detail & Related papers (2025-04-02T04:24:41Z)
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks [57.589795399265945]
We introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks.<n>We also introduce BigDocs-Bench, a benchmark suite with 10 novel tasks.<n>Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o.
arXiv Detail & Related papers (2024-12-05T21:41:20Z)
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models [44.08092362611575]
We introduce SpreadsheetLLM, an efficient encoding method for large language models (LLMs) on spreadsheets.<n>We develop SheetCompressor, an innovative encoding framework that compresses spreadsheets effectively for LLMs.<n>Fine-tuned LLM with SheetCompressor has an average compression ratio of 25 times, and achieves a state-of-the-art 78.9% F1 score, surpassing the best existing models by 12.3%.
arXiv Detail & Related papers (2024-07-12T06:34:21Z)
SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation [34.8332394229927]
SpreadsheetBench is designed to immerse current large language models (LLMs) in the actual workflow of spreadsheet users. Unlike existing benchmarks that rely on synthesized queries and simplified spreadsheet files, SpreadsheetBench is built from 912 real questions gathered from online Excel forums. Our comprehensive evaluation of various LLMs under both single-round and multi-round inference settings reveals a substantial gap between the state-of-the-art (SOTA) models and human performance.
arXiv Detail & Related papers (2024-06-21T09:06:45Z)
SpreadsheetCoder: Formula Prediction from Semi-structured Context [70.41579328458116]
We propose a BERT-based model architecture to represent the tabular context in both row-based and column-based formats. We train our model on a large dataset of spreadsheets, and demonstrate that SpreadsheetCoder achieves top-1 prediction accuracy of 42.51%. Compared to the rule-based system, SpreadsheetCoder 82% assists more users in composing formulas on Google Sheets.
arXiv Detail & Related papers (2021-06-26T11:26:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.