SheetDesigner: MLLM-Powered Spreadsheet Layout Generation with Rule-Based and Vision-Based Reflection
- URL: http://arxiv.org/abs/2509.07473v1
- Date: Tue, 09 Sep 2025 07:51:38 GMT
- Title: SheetDesigner: MLLM-Powered Spreadsheet Layout Generation with Rule-Based and Vision-Based Reflection
- Authors: Qin Chen, Yuanyi Ren, Xiaojun Ma, Mugeng Liu, Han Shi, Dongmei Zhang,
- Abstract summary: SheetDesigner is a zero-shot framework that combines rule and vision reflection for component placement and content population.<n>We find that through vision modality, MLLMs handle overlap and balance well but struggle with alignment.
- Score: 26.315814679351988
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spreadsheets are critical to data-centric tasks, with rich, structured layouts that enable efficient information transmission. Given the time and expertise required for manual spreadsheet layout design, there is an urgent need for automated solutions. However, existing automated layout models are ill-suited to spreadsheets, as they often (1) treat components as axis-aligned rectangles with continuous coordinates, overlooking the inherently discrete, grid-based structure of spreadsheets; and (2) neglect interrelated semantics, such as data dependencies and contextual links, unique to spreadsheets. In this paper, we first formalize the spreadsheet layout generation task, supported by a seven-criterion evaluation protocol and a dataset of 3,326 spreadsheets. We then introduce SheetDesigner, a zero-shot and training-free framework using Multimodal Large Language Models (MLLMs) that combines rule and vision reflection for component placement and content population. SheetDesigner outperforms five baselines by at least 22.6\%. We further find that through vision modality, MLLMs handle overlap and balance well but struggle with alignment, necessitates hybrid rule and visual reflection strategies. Our codes and data is available at Github.
Related papers
- MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns [80.05126590825121]
MonkeyOCR v1.5 is a unified vision-language framework that enhances both layout understanding and content recognition.<n>To address complex table structures, we propose a visual consistency-based reinforcement learning scheme.<n>Two specialized modules, Image-Decoupled Table Parsing and Type-Guided Table Merging, are introduced to enable reliable parsing of tables.
arXiv Detail & Related papers (2025-11-13T15:12:17Z) - TableDART: Dynamic Adaptive Multi-Modal Routing for Table Understanding [52.59372043981724]
TableDART is a training-efficient framework that integrates multimodal views by reusing pretrained single-modality models.<n>In addition, we propose a novel agent to cross-modal knowledge integration by analyzing outputs from text- and image-based models.
arXiv Detail & Related papers (2025-09-18T07:00:13Z) - SpreadsheetLLM: Encoding Spreadsheets for Large Language Models [44.08092362611575]
We introduce SpreadsheetLLM, an efficient encoding method for large language models (LLMs) on spreadsheets.<n>We develop SheetCompressor, an innovative encoding framework that compresses spreadsheets effectively for LLMs.<n>Fine-tuned LLM with SheetCompressor has an average compression ratio of 25 times, and achieves a state-of-the-art 78.9% F1 score, surpassing the best existing models by 12.3%.
arXiv Detail & Related papers (2024-07-12T06:34:21Z) - SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation [34.8332394229927]
SpreadsheetBench is designed to immerse current large language models (LLMs) in the actual workflow of spreadsheet users.
Unlike existing benchmarks that rely on synthesized queries and simplified spreadsheet files, SpreadsheetBench is built from 912 real questions gathered from online Excel forums.
Our comprehensive evaluation of various LLMs under both single-round and multi-round inference settings reveals a substantial gap between the state-of-the-art (SOTA) models and human performance.
arXiv Detail & Related papers (2024-06-21T09:06:45Z) - TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios [51.66718740300016]
TableLLM is a robust large language model (LLM) with 8 billion parameters.<n>TableLLM is purpose-built for proficiently handling data manipulation tasks.<n>We have released the model checkpoint, source code, benchmarks, and a web application for user interaction.
arXiv Detail & Related papers (2024-03-28T11:21:12Z) - DocLLM: A layout-aware generative language model for multimodal document
understanding [12.093889265216205]
We present DocLLM, a lightweight extension to traditional large language models (LLMs) for reasoning over visual documents.
Our model focuses exclusively on bounding box information to incorporate the spatial layout structure.
We demonstrate that our solution outperforms SotA LLMs on 14 out of 16 datasets across all tasks, and generalizes well to 4 out of 5 previously unseen datasets.
arXiv Detail & Related papers (2023-12-31T22:37:52Z) - SpreadsheetCoder: Formula Prediction from Semi-structured Context [70.41579328458116]
We propose a BERT-based model architecture to represent the tabular context in both row-based and column-based formats.
We train our model on a large dataset of spreadsheets, and demonstrate that SpreadsheetCoder achieves top-1 prediction accuracy of 42.51%.
Compared to the rule-based system, SpreadsheetCoder 82% assists more users in composing formulas on Google Sheets.
arXiv Detail & Related papers (2021-06-26T11:26:27Z) - A Graph Representation of Semi-structured Data for Web Question
Answering [96.46484690047491]
We propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations.
Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-14T04:01:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.