START: Spatial and Textual Learning for Chart Understanding
- URL: http://arxiv.org/abs/2512.07186v1
- Date: Mon, 08 Dec 2025 05:43:14 GMT
- Title: START: Spatial and Textual Learning for Chart Understanding
- Authors: Zhuoming Liu, Xiaofeng Gao, Feiyang Niu, Qiaozi Gao, Liu Liu, Robinson Piramuthu,
- Abstract summary: We propose START, the Spatial and Textual learning for chART understanding.<n>We introduce (i) chart-element grounding and (ii) chart-to-code generation to strengthen an MLLM's understanding of both chart visual layout and data details.<n>Code, data and models will be publicly available.
- Score: 11.769123092079203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chart understanding is crucial for deploying multimodal large language models (MLLMs) in real-world scenarios such as analyzing scientific papers and technical reports. Unlike natural images, charts pair a structured visual layout (spatial property) with an underlying data representation (textual property) -- grasping both is essential for precise, fine-grained chart reasoning. Motivated by this observation, we propose START, the Spatial and Textual learning for chART understanding. Specifically, we introduce (i) chart-element grounding and (ii) chart-to-code generation to strengthen an MLLM's understanding of both chart visual layout and data details. To facilitate spatial and textual learning, we propose the START-Dataset generated with a novel data-generation pipeline that first leverages an MLLM to translate real chart images into executable chart code, recovering the underlying data representation while preserving the visual distribution of real-world charts. We then evolve the code with a Large Language Model (LLM) to ascertain the positions of chart elements that capture the chart's visual structure, addressing challenges that existing methods cannot handle. To evaluate a model's ability to understand chart spatial structures, we propose the Chart Spatial understanding Benchmark (CS-Bench), filling a critical gap in comprehensive chart understanding evaluation. Leveraging spatial and textual learning, START delivers consistent gains across model sizes and benchmarks over the base models and surpasses prior state-of-the-art by a clear margin. Code, data and models will be publicly available.
Related papers
- ChartAnchor: Chart Grounding with Structural-Semantic Fidelity [19.798612765001746]
Chart grounding refers to the bidirectional alignment between a chart's visual appearance and the structured semantics.<n>ChartAnchor is a benchmark of 8k+ chart-table-code triples spanning 30 chart types drawn from diverse real-world and augmented sources.<n>A multi-level evaluation framework integrates semantic validation, stylistic analysis, and perceptual metrics to assess both structural and content-level correctness.
arXiv Detail & Related papers (2025-11-30T18:28:09Z) - ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning [54.86473583610112]
We propose PointCoT, which integrates reflective interaction into chain-of-thought reasoning in charts.<n>By prompting MLLMs to generate bounding boxes and re-render charts based on location annotations, we establish connections between textual reasoning steps and visual grounding regions.<n>We develop two instruction-tuned models, ChartPointQ2 and ChartPointQ2.5, which outperform state-of-the-art across several chart benchmarks.
arXiv Detail & Related papers (2025-11-29T04:01:55Z) - BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning [51.472854950300416]
We propose BigCharts, a dataset creation pipeline that generates visually diverse chart images.<n>Unlike purely synthetic datasets, BigCharts incorporates real-world data, ensuring authenticity and visual diversity.<n>By introducing novel reward signals specifically designed for chart reasoning, our approach enhances model robustness and generalization.
arXiv Detail & Related papers (2025-08-13T13:39:17Z) - ChartLens: Fine-grained Visual Attribution in Charts [106.44872805609673]
Post-Hoc Visual Attribution for Charts identifies fine-grained chart elements that validate a given chart-associated response.<n>We propose ChartLens, a novel chart attribution algorithm that uses segmentation-based techniques to identify chart objects.<n>Our evaluations show that ChartLens improves fine-grained attributions by 26-66%.
arXiv Detail & Related papers (2025-05-25T23:17:32Z) - ChartAdapter: Large Vision-Language Model for Chart Summarization [13.499376163294816]
ChartAdapter is a lightweight transformer module designed to bridge the gap between charts and textual summaries.<n>By integrating ChartAdapter with an LLM, we enable end-to-end training and efficient chart summarization.
arXiv Detail & Related papers (2024-12-30T05:07:34Z) - On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts.<n>We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z) - ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild [28.643565008567172]
We introduce ChartGemma, a novel chart understanding and reasoning model developed over PaliGemma.
Rather than relying on underlying data tables, ChartGemma is trained on instruction-tuning data generated directly from chart images.
Our simple approach achieves state-of-the-art results across $5$ benchmarks spanning chart summarization, question answering, and fact-checking.
arXiv Detail & Related papers (2024-07-04T22:16:40Z) - StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding [54.45681512355684]
Current chart-related tasks focus on either chart perception that extracts information from the visual charts, or chart reasoning given the extracted data.<n>We introduce StructChart, a novel framework that leverages Structured Triplet Representations (STR) to achieve a unified and label-efficient approach.
arXiv Detail & Related papers (2023-09-20T12:51:13Z) - ChartReader: A Unified Framework for Chart Derendering and Comprehension
without Heuristic Rules [89.75395046894809]
We present ChartReader, a unified framework that seamlessly integrates chart derendering and comprehension tasks.
Our approach includes a transformer-based chart component detection module and an extended pre-trained vision-language model for chart-to-X tasks.
Our proposed framework can significantly reduce the manual effort involved in chart analysis, providing a step towards a universal chart understanding model.
arXiv Detail & Related papers (2023-04-05T00:25:27Z) - Chart-to-Text: A Large-Scale Benchmark for Chart Summarization [9.647079534077472]
We present Chart-to-text, a large-scale benchmark with two datasets and a total of 44,096 charts.
We explain the dataset construction process and analyze the datasets.
arXiv Detail & Related papers (2022-03-12T17:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.