ChartAnchor: Chart Grounding with Structural-Semantic Fidelity
- URL: http://arxiv.org/abs/2512.01017v2
- Date: Mon, 08 Dec 2025 06:17:19 GMT
- Title: ChartAnchor: Chart Grounding with Structural-Semantic Fidelity
- Authors: Xinhang Li, Jingbo Zhou, Pengfei Luo, Yixiong Xiao, Tong Xu,
- Abstract summary: Chart grounding refers to the bidirectional alignment between a chart's visual appearance and the structured semantics.<n>ChartAnchor is a benchmark of 8k+ chart-table-code triples spanning 30 chart types drawn from diverse real-world and augmented sources.<n>A multi-level evaluation framework integrates semantic validation, stylistic analysis, and perceptual metrics to assess both structural and content-level correctness.
- Score: 19.798612765001746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in multimodal large language models (MLLMs) highlight the need for benchmarks that rigorously evaluate structured chart comprehension. Chart grounding refers to the bidirectional alignment between a chart's visual appearance and the structured semantics. This task requires models to produce a symbolic specification that faithfully captures the chart's visual and structural intent, while also recovering the underlying tabular data with precise values and relationships. Chart grounding directly reflects a model's capabilities in numerical reasoning, multimodal alignment, and structural reconstruction, and has several important applications in real-world scenarios. Existing benchmarks, constrained by narrow chart diversity, isolated tasks, and incomplete evaluation frameworks, fail to holistically assess grounding. To address this, we propose ChartAnchor, a comprehensive benchmark of 8k+ chart-table-code triples spanning 30 chart types drawn from diverse real-world and augmented sources. ChartAnchor introduces two complementary tasks: chart-to-code generation (synthesizing executable code to replicate charts) and controlled chart-to-table reconstruction (extracting exact data with predefined headers), enabling cross-validation of visual and numerical fidelity. A multi-level evaluation framework integrates semantic validation, stylistic analysis, and perceptual metrics to assess both structural and content-level correctness. Extensive experiments on MLLMs reveal critical limitations in numerical precision and code synthesis, emphasizing the need for structured reasoning beyond surface-level perception. By unifying symbolic and data-driven grounding, ChartAnchor establishes a rigorous foundation for chart grounding, offering meaningful insights for advancing MLLMs in scientific, financial, and industrial domains.
Related papers
- Chart Specification: Structural Representations for Incentivizing VLM Reasoning in Chart-to-Code Generation [11.18352269863283]
Vision-Language Models (VLMs) have shown promise in generating plotting code from chart images.<n>Existing approaches largely rely on supervised fine-tuning, encouraging surface-level token imitation.<n>We propose Chart Specification, a structured intermediate representation that shifts training from text imitation to semantically grounded supervision.
arXiv Detail & Related papers (2026-02-11T14:08:06Z) - ChartE$^{3}$: A Comprehensive Benchmark for End-to-End Chart Editing [64.65742943745866]
ChartE$3$ is an End-to-End Chart Editing benchmark.<n>It directly evaluates models without relying on intermediate natural language programs or code-level supervision.<n>It contains over 1,200 high-quality samples constructed via a well-designed data pipeline with human curation.
arXiv Detail & Related papers (2026-01-29T13:29:27Z) - START: Spatial and Textual Learning for Chart Understanding [11.769123092079203]
We propose START, the Spatial and Textual learning for chART understanding.<n>We introduce (i) chart-element grounding and (ii) chart-to-code generation to strengthen an MLLM's understanding of both chart visual layout and data details.<n>Code, data and models will be publicly available.
arXiv Detail & Related papers (2025-12-08T05:43:14Z) - ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning [54.86473583610112]
We propose PointCoT, which integrates reflective interaction into chain-of-thought reasoning in charts.<n>By prompting MLLMs to generate bounding boxes and re-render charts based on location annotations, we establish connections between textual reasoning steps and visual grounding regions.<n>We develop two instruction-tuned models, ChartPointQ2 and ChartPointQ2.5, which outperform state-of-the-art across several chart benchmarks.
arXiv Detail & Related papers (2025-11-29T04:01:55Z) - ChartAB: A Benchmark for Chart Grounding & Dense Alignment [17.16234793106]
We introduce a novel "ChartAlign Benchmark (ChartAB)" to provide a comprehensive evaluation of vision-language models (VLMs)<n>By incorporating a novel two-stage inference workflow, the benchmark can further evaluate VLMs capability to align and compare elements/attributes across two charts.<n>Our analysis of evaluations reveals new insights into their perception biases, weaknesses, robustness, and hallucinations in chart understanding.
arXiv Detail & Related papers (2025-10-30T17:56:31Z) - BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning [51.472854950300416]
We propose BigCharts, a dataset creation pipeline that generates visually diverse chart images.<n>Unlike purely synthetic datasets, BigCharts incorporates real-world data, ensuring authenticity and visual diversity.<n>By introducing novel reward signals specifically designed for chart reasoning, our approach enhances model robustness and generalization.
arXiv Detail & Related papers (2025-08-13T13:39:17Z) - InterChart: Benchmarking Visual Reasoning Across Decomposed and Distributed Chart Information [44.79888692172093]
We introduce InterChart, a diagnostic benchmark that evaluates how well vision-language models (VLMs) reason across multiple related charts.<n>We organize the benchmark into three tiers of increasing difficulty: factual reasoning over individual charts, integrative analysis across synthetically aligned chart sets, and semantic inference over visually complex, real-world chart pairs.
arXiv Detail & Related papers (2025-08-11T05:19:23Z) - Socratic Chart: Cooperating Multiple Agents for Robust SVG Chart Understanding [14.75820681491341]
Existing benchmarks reveal reliance on text-based shortcuts and probabilistic pattern-matching rather than genuine visual reasoning.<n>We propose Socratic Chart, a new framework that transforms chart images into Scalable Vector Graphics representations.<n>Our framework surpasses state-of-the-art models in accurately capturing chart primitives and improving reasoning performance.
arXiv Detail & Related papers (2025-04-14T00:07:39Z) - RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning [63.599057862999]
RefChartQA is a novel benchmark that integrates Chart Question Answering (ChartQA) with visual grounding.<n>Our experiments demonstrate that incorporating spatial awareness via grounding improves response accuracy by over 15%.
arXiv Detail & Related papers (2025-03-29T15:50:08Z) - Graph-Based Multimodal Contrastive Learning for Chart Question Answering [11.828192162922436]
This work introduces a novel joint multimodal scene graph framework that explicitly models the relationships among chart components and their underlying structures.<n>The framework integrates both visual and textual graphs to capture structural and semantic characteristics.<n>A graph contrastive learning strategy aligns node representations across modalities enabling their seamless incorporation into a transformer decoder as soft prompts.
arXiv Detail & Related papers (2025-01-08T06:27:07Z) - On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts.<n>We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z) - StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding [54.45681512355684]
Current chart-related tasks focus on either chart perception that extracts information from the visual charts, or chart reasoning given the extracted data.<n>We introduce StructChart, a novel framework that leverages Structured Triplet Representations (STR) to achieve a unified and label-efficient approach.
arXiv Detail & Related papers (2023-09-20T12:51:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.