Related papers: ChartAnchor: Chart Grounding with Structural-Semantic Fidelity

ChartAnchor: Chart Grounding with Structural-Semantic Fidelity

URL: http://arxiv.org/abs/2512.01017v2
Date: Mon, 08 Dec 2025 06:17:19 GMT
Title: ChartAnchor: Chart Grounding with Structural-Semantic Fidelity
Authors: Xinhang Li, Jingbo Zhou, Pengfei Luo, Yixiong Xiao, Tong Xu,
Abstract summary: Chart grounding refers to the bidirectional alignment between a chart's visual appearance and the structured semantics.<n>ChartAnchor is a benchmark of 8k+ chart-table-code triples spanning 30 chart types drawn from diverse real-world and augmented sources.<n>A multi-level evaluation framework integrates semantic validation, stylistic analysis, and perceptual metrics to assess both structural and content-level correctness.
Score: 19.798612765001746
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in multimodal large language models (MLLMs) highlight the need for benchmarks that rigorously evaluate structured chart comprehension. Chart grounding refers to the bidirectional alignment between a chart's visual appearance and the structured semantics. This task requires models to produce a symbolic specification that faithfully captures the chart's visual and structural intent, while also recovering the underlying tabular data with precise values and relationships. Chart grounding directly reflects a model's capabilities in numerical reasoning, multimodal alignment, and structural reconstruction, and has several important applications in real-world scenarios. Existing benchmarks, constrained by narrow chart diversity, isolated tasks, and incomplete evaluation frameworks, fail to holistically assess grounding. To address this, we propose ChartAnchor, a comprehensive benchmark of 8k+ chart-table-code triples spanning 30 chart types drawn from diverse real-world and augmented sources. ChartAnchor introduces two complementary tasks: chart-to-code generation (synthesizing executable code to replicate charts) and controlled chart-to-table reconstruction (extracting exact data with predefined headers), enabling cross-validation of visual and numerical fidelity. A multi-level evaluation framework integrates semantic validation, stylistic analysis, and perceptual metrics to assess both structural and content-level correctness. Extensive experiments on MLLMs reveal critical limitations in numerical precision and code synthesis, emphasizing the need for structured reasoning beyond surface-level perception. By unifying symbolic and data-driven grounding, ChartAnchor establishes a rigorous foundation for chart grounding, offering meaningful insights for advancing MLLMs in scientific, financial, and industrial domains.

Related papers

Chart Specification: Structural Representations for Incentivizing VLM Reasoning in Chart-to-Code Generation [11.18352269863283]
Vision-Language Models (VLMs) have shown promise in generating plotting code from chart images.<n>Existing approaches largely rely on supervised fine-tuning, encouraging surface-level token imitation.<n>We propose Chart Specification, a structured intermediate representation that shifts training from text imitation to semantically grounded supervision.
arXiv Detail & Related papers (2026-02-11T14:08:06Z)
ChartE$^{3}$: A Comprehensive Benchmark for End-to-End Chart Editing [64.65742943745866]
ChartE$3$ is an End-to-End Chart Editing benchmark.<n>It directly evaluates models without relying on intermediate natural language programs or code-level supervision.<n>It contains over 1,200 high-quality samples constructed via a well-designed data pipeline with human curation.
arXiv Detail & Related papers (2026-01-29T13:29:27Z)
START: Spatial and Textual Learning for Chart Understanding [11.769123092079203]
We propose START, the Spatial and Textual learning for chART understanding.<n>We introduce (i) chart-element grounding and (ii) chart-to-code generation to strengthen an MLLM's understanding of both chart visual layout and data details.<n>Code, data and models will be publicly available.
arXiv Detail & Related papers (2025-12-08T05:43:14Z)
ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning [54.86473583610112]
We propose PointCoT, which integrates reflective interaction into chain-of-thought reasoning in charts.<n>By prompting MLLMs to generate bounding boxes and re-render charts based on location annotations, we establish connections between textual reasoning steps and visual grounding regions.<n>We develop two instruction-tuned models, ChartPointQ2 and ChartPointQ2.5, which outperform state-of-the-art across several chart benchmarks.
arXiv Detail & Related papers (2025-11-29T04:01:55Z)
ChartAB: A Benchmark for Chart Grounding & Dense Alignment [17.16234793106]
We introduce a novel "ChartAlign Benchmark (ChartAB)" to provide a comprehensive evaluation of vision-language models (VLMs)<n>By incorporating a novel two-stage inference workflow, the benchmark can further evaluate VLMs capability to align and compare elements/attributes across two charts.<n>Our analysis of evaluations reveals new insights into their perception biases, weaknesses, robustness, and hallucinations in chart understanding.
arXiv Detail & Related papers (2025-10-30T17:56:31Z)
BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning [51.472854950300416]
We propose BigCharts, a dataset creation pipeline that generates visually diverse chart images.<n>Unlike purely synthetic datasets, BigCharts incorporates real-world data, ensuring authenticity and visual diversity.<n>By introducing novel reward signals specifically designed for chart reasoning, our approach enhances model robustness and generalization.
arXiv Detail & Related papers (2025-08-13T13:39:17Z)
InterChart: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information [44.79888692172093]
We introduce InterChart, a diagnostic benchmark that evaluates how well vision-language models (VLMs) reason across multiple related charts.<n>We organize the benchmark into three tiers of increasing difficulty: factual reasoning over individual charts, integrative analysis across synthetically aligned chart sets, and semantic inference over visually complex, real-world chart pairs.
arXiv Detail & Related papers (2025-08-11T05:19:23Z)
Socratic Chart: Cooperating Multiple Agents for Robust SVG Chart Understanding [14.75820681491341]
Existing benchmarks reveal reliance on text-based shortcuts and probabilistic pattern-matching rather than genuine visual reasoning.<n>We propose Socratic Chart, a new framework that transforms chart images into Scalable Vector Graphics representations.<n>Our framework surpasses state-of-the-art models in accurately capturing chart primitives and improving reasoning performance.
arXiv Detail & Related papers (2025-04-14T00:07:39Z)
RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning [63.599057862999]
RefChartQA is a novel benchmark that integrates Chart Question Answering (ChartQA) with visual grounding.<n>Our experiments demonstrate that incorporating spatial awareness via grounding improves response accuracy by over 15%.
arXiv Detail & Related papers (2025-03-29T15:50:08Z)
Graph-Based Multimodal Contrastive Learning for Chart Question Answering [11.828192162922436]
This work introduces a novel joint multimodal scene graph framework that explicitly models the relationships among chart components and their underlying structures.<n>The framework integrates both visual and textual graphs to capture structural and semantic characteristics.<n>A graph contrastive learning strategy aligns node representations across modalities enabling their seamless incorporation into a transformer decoder as soft prompts.
arXiv Detail & Related papers (2025-01-08T06:27:07Z)
On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts.<n>We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z)
StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding [54.45681512355684]
Current chart-related tasks focus on either chart perception that extracts information from the visual charts, or chart reasoning given the extracted data.<n>We introduce StructChart, a novel framework that leverages Structured Triplet Representations (STR) to achieve a unified and label-efficient approach.
arXiv Detail & Related papers (2023-09-20T12:51:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.