Related papers: Chart Specification: Structural Representations for Incentivizing VLM Reasoning in Chart-to-Code Generation

Chart Specification: Structural Representations for Incentivizing VLM Reasoning in Chart-to-Code Generation

URL: http://arxiv.org/abs/2602.10880v1
Date: Wed, 11 Feb 2026 14:08:06 GMT
Title: Chart Specification: Structural Representations for Incentivizing VLM Reasoning in Chart-to-Code Generation
Authors: Minggui He, Mingchen Dai, Jian Zhang, Yilun Liu, Shimin Tao, Pufan Zeng, Osamu Yoshie, Yuya Ieiri,
Abstract summary: Vision-Language Models (VLMs) have shown promise in generating plotting code from chart images.<n>Existing approaches largely rely on supervised fine-tuning, encouraging surface-level token imitation.<n>We propose Chart Specification, a structured intermediate representation that shifts training from text imitation to semantically grounded supervision.
Score: 11.18352269863283
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision-Language Models (VLMs) have shown promise in generating plotting code from chart images, yet achieving structural fidelity remains challenging. Existing approaches largely rely on supervised fine-tuning, encouraging surface-level token imitation rather than faithful modeling of underlying chart structure, which often leads to hallucinated or semantically inconsistent outputs. We propose Chart Specification, a structured intermediate representation that shifts training from text imitation to semantically grounded supervision. Chart Specification filters syntactic noise to construct a structurally balanced training set and supports a Spec-Align Reward that provides fine-grained, verifiable feedback on structural correctness, enabling reinforcement learning to enforce consistent plotting logic. Experiments on three public benchmarks show that our method consistently outperforms prior approaches. With only 3K training samples, we achieve strong data efficiency, surpassing leading baselines by up to 61.7% on complex benchmarks, and scaling to 4K samples establishes new state-of-the-art results across all evaluated metrics. Overall, our results demonstrate that precise structural supervision offers an efficient pathway to high-fidelity chart-to-code generation. Code and dataset are available at: https://github.com/Mighten/chart-specification-paper

Related papers

ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch [57.01439313241121]
We introduce Rollout Posterior Entropy (RPE), a novel metric that quantifies chart complexity.<n>We also develop truth-anchored inverse QA synthesis to guarantee reasoning rigor.<n>To further elevate difficulty and reasoning depth, we filter samples based on model fail-rate and distill high-quality Chain-of-Thought (CoT) reasoning.
arXiv Detail & Related papers (2026-01-20T05:11:44Z)
Semi-supervised Instruction Tuning for Large Language Models on Text-Attributed Graphs [62.544129365882014]
We propose a novel Semi-supervised Instruction Tuning pipeline for Graph Learning, named SIT-Graph.<n> SIT-Graph is model-agnostic and can be seamlessly integrated into any graph instruction tuning method that utilizes LLMs as the predictor.<n>Extensive experiments demonstrate that when incorporated into state-of-the-art graph instruction tuning methods, SIT-Graph significantly enhances their performance on text-attributed graph benchmarks.
arXiv Detail & Related papers (2026-01-19T08:10:53Z)
ChartAnchor: Chart Grounding with Structural-Semantic Fidelity [19.798612765001746]
Chart grounding refers to the bidirectional alignment between a chart's visual appearance and the structured semantics.<n>ChartAnchor is a benchmark of 8k+ chart-table-code triples spanning 30 chart types drawn from diverse real-world and augmented sources.<n>A multi-level evaluation framework integrates semantic validation, stylistic analysis, and perceptual metrics to assess both structural and content-level correctness.
arXiv Detail & Related papers (2025-11-30T18:28:09Z)
Breaking the SFT Plateau: Multimodal Structured Reinforcement Learning for Chart-to-Code Generation [12.822184232115333]
We propose Multimodal Structured Reinforcement Learning (MSRL) for chart-to-code generation.<n>We construct the largest training corpus to date, containing 3 million chart-code pairs from real-world arXiv tables.<n>MSRL significantly breaks the SFT plateau, improving high-level metrics by 6.2% and 9.9% on ChartMimic and ReachQA benchmarks respectively.
arXiv Detail & Related papers (2025-08-19T07:40:18Z)
BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning [51.472854950300416]
We propose BigCharts, a dataset creation pipeline that generates visually diverse chart images.<n>Unlike purely synthetic datasets, BigCharts incorporates real-world data, ensuring authenticity and visual diversity.<n>By introducing novel reward signals specifically designed for chart reasoning, our approach enhances model robustness and generalization.
arXiv Detail & Related papers (2025-08-13T13:39:17Z)
Compile Scene Graphs with Reinforcement Learning [69.36723767339001]
Next-token prediction is the fundamental principle for training large language models (LLMs)<n>We introduce R1-SGG, a multimodal LLM (M-LLM) initially trained via supervised fine-tuning (SFT) on the scene graph dataset.<n>We design a set of graph-centric rewards, including three recall-based variants -- Hard Recall, Hard Recall+Relax, and Soft Recall.
arXiv Detail & Related papers (2025-04-18T10:46:22Z)
Socratic Chart: Cooperating Multiple Agents for Robust SVG Chart Understanding [14.75820681491341]
Existing benchmarks reveal reliance on text-based shortcuts and probabilistic pattern-matching rather than genuine visual reasoning.<n>We propose Socratic Chart, a new framework that transforms chart images into Scalable Vector Graphics representations.<n>Our framework surpasses state-of-the-art models in accurately capturing chart primitives and improving reasoning performance.
arXiv Detail & Related papers (2025-04-14T00:07:39Z)
Graph-Based Multimodal Contrastive Learning for Chart Question Answering [11.828192162922436]
This work introduces a novel joint multimodal scene graph framework that explicitly models the relationships among chart components and their underlying structures.<n>The framework integrates both visual and textual graphs to capture structural and semantic characteristics.<n>A graph contrastive learning strategy aligns node representations across modalities enabling their seamless incorporation into a transformer decoder as soft prompts.
arXiv Detail & Related papers (2025-01-08T06:27:07Z)
Learning to Model Graph Structural Information on MLPs via Graph Structure Self-Contrasting [50.181824673039436]
We propose a Graph Structure Self-Contrasting (GSSC) framework that learns graph structural information without message passing. The proposed framework is based purely on Multi-Layer Perceptrons (MLPs), where the structural information is only implicitly incorporated as prior knowledge. It first applies structural sparsification to remove potentially uninformative or noisy edges in the neighborhood, and then performs structural self-contrasting in the sparsified neighborhood to learn robust node representations.
arXiv Detail & Related papers (2024-09-09T12:56:02Z)
Multi-task Self-distillation for Graph-based Semi-Supervised Learning [6.277952154365413]
We propose a multi-task self-distillation framework that injects self-supervised learning and self-distillation into graph convolutional networks. First, we formulate a self-supervision pipeline based on pre-text tasks to capture different levels of similarities in graphs. Second, self-distillation uses soft labels of the model itself as additional supervision.
arXiv Detail & Related papers (2021-12-02T12:43:41Z)
Heuristic Semi-Supervised Learning for Graph Generation Inspired by Electoral College [80.67842220664231]
We propose a novel pre-processing technique, namely ELectoral COllege (ELCO), which automatically expands new nodes and edges to refine the label similarity within a dense subgraph. In all setups tested, our method boosts the average score of base models by a large margin of 4.7 points, as well as consistently outperforms the state-of-the-art.
arXiv Detail & Related papers (2020-06-10T14:48:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.