ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch
- URL: http://arxiv.org/abs/2601.13606v1
- Date: Tue, 20 Jan 2026 05:11:44 GMT
- Title: ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch
- Authors: Zheng Liu, Honglin Lin, Chonghan Qin, Xiaoyang Wang, Xin Gao, Yu Li, Mengzhang Cai, Yun Zhu, Zhanping Zhong, Qizhi Pei, Zhuoshi Pan, Xiaoran Shang, Bin Cui, Conghui He, Wentao Zhang, Lijun Wu,
- Abstract summary: We introduce Rollout Posterior Entropy (RPE), a novel metric that quantifies chart complexity.<n>We also develop truth-anchored inverse QA synthesis to guarantee reasoning rigor.<n>To further elevate difficulty and reasoning depth, we filter samples based on model fail-rate and distill high-quality Chain-of-Thought (CoT) reasoning.
- Score: 57.01439313241121
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chart reasoning is a critical capability for Vision Language Models (VLMs). However, the development of open-source models is severely hindered by the lack of high-quality training data. Existing datasets suffer from a dual challenge: synthetic charts are often simplistic and repetitive, while the associated QA pairs are prone to hallucinations and lack the reasoning depth required for complex tasks. To bridge this gap, we propose ChartVerse, a scalable framework designed to synthesize complex charts and reliable reasoning data from scratch. (1) To address the bottleneck of simple patterns, we first introduce Rollout Posterior Entropy (RPE), a novel metric that quantifies chart complexity. Guided by RPE, we develop complexity-aware chart coder to autonomously synthesize diverse, high-complexity charts via executable programs. (2) To guarantee reasoning rigor, we develop truth-anchored inverse QA synthesis. Diverging from standard generation, we adopt an answer-first paradigm: we extract deterministic answers directly from the source code, generate questions conditional on these anchors, and enforce strict consistency verification. To further elevate difficulty and reasoning depth, we filter samples based on model fail-rate and distill high-quality Chain-of-Thought (CoT) reasoning. We curate ChartVerse-SFT-600K and ChartVerse-RL-40K using Qwen3-VL-30B-A3B-Thinking as the teacher. Experimental results demonstrate that ChartVerse-8B achieves state-of-the-art performance, notably surpassing its teacher and rivaling the stronger Qwen3-VL-32B-Thinking.
Related papers
- Visual Programmability: A Guide for Code-as-Thought in Chart Understanding [37.44645754630439]
We propose a Code-as-Thought (CaT) approach to represent the visual information of a chart in a verifiable, symbolic format.<n>Visual Programmability is a learnable property that determines if a chart-question pair is better solved with code or direct visual analysis.<n>We implement this concept in an adaptive framework where a Vision-Language Models (VLMs) learns to choose between the CaT pathway and a direct visual reasoning pathway.
arXiv Detail & Related papers (2025-09-11T09:22:16Z) - Chart-CoCa: Self-Improving Chart Understanding of Vision LMs via Code-Driven Synthesis and Candidate-Conditioned Answering [4.036085193573325]
Vision Language Models (VLMs) often struggle with chart understanding tasks, particularly in accurate chart description and complex reasoning.<n>We introduce a chart synthesis pipeline that generates aligned chart-question-answer triplets through code generation and execution.<n>Experiments demonstrate significant improvements, with up to 15.50 points accuracy gain over the initial VLM.
arXiv Detail & Related papers (2025-08-16T08:26:55Z) - Chart-R1: Chain-of-Thought Supervision and Reinforcement for Advanced Chart Reasoner [13.465161900684432]
We introduce Chart-R1, a chart-domain vision-language model with reinforcement learning fine-tuning to enable complex chart reasoning.<n>To support Chart-R1, we first propose a novel programmatic data technology to generate high-quality step-by-step chart reasoning data.<n>Then we develop a two-stage training strategy: Chart-COT with step-by-step chain-of-thought supervision, and Chart-RFT with numerically sensitive reinforcement fine-tuning.
arXiv Detail & Related papers (2025-07-21T11:22:17Z) - ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering [12.285453136336507]
We propose a code-driven framework designed to enable precise, interpretable reasoning over charts.<n>We first train a high-fidelity model to convert diverse chart images into structured ECharts codes.<n>Then, we design a general chart reasoning data synthesis pipeline.<n>Finally, we train the final multimodal model using a combination of supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-06-11T18:55:36Z) - Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering [75.12322966980003]
Large Language Models (LLMs) have shown strong inductive reasoning ability across various domains.<n>Most existing RAG pipelines rely on unstructured text, limiting interpretability and structured reasoning.<n>Recent studies have explored integrating knowledge graphs with LLMs for knowledge graph question answering.<n>We propose RAPL, a novel framework for efficient and effective graph retrieval in KGQA.
arXiv Detail & Related papers (2025-06-11T12:03:52Z) - Compile Scene Graphs with Reinforcement Learning [69.36723767339001]
Next-token prediction is the fundamental principle for training large language models (LLMs)<n>We introduce R1-SGG, a multimodal LLM (M-LLM) initially trained via supervised fine-tuning (SFT) on the scene graph dataset.<n>We design a set of graph-centric rewards, including three recall-based variants -- Hard Recall, Hard Recall+Relax, and Soft Recall.
arXiv Detail & Related papers (2025-04-18T10:46:22Z) - Path-of-Thoughts: Extracting and Following Paths for Robust Relational Reasoning with Large Language Models [62.12031550252253]
We present Path-of-Thoughts (PoT), a novel framework designed to tackle relation reasoning.<n>PoT efficiently extracts a task-agnostic graph that identifies crucial entities, relations, and attributes within the problem context.<n>PoT identifies relevant reasoning chains within the graph corresponding to the posed question, facilitating inference of potential answers.
arXiv Detail & Related papers (2024-12-23T20:27:12Z) - What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding [67.59552859593985]
Graph Transformers, which incorporate self-attention and positional encoding, have emerged as a powerful architecture for various graph learning tasks.
This paper introduces first theoretical investigation of a shallow Graph Transformer for semi-supervised classification.
arXiv Detail & Related papers (2024-06-04T05:30:16Z) - Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA [9.659820850719413]
We leverage Large Language Models (LLMs), which have shown to have strong reasoning ability, as an automatic data annotator.
Key innovation in our method lies in the Synthesize Step-by-Step strategy.
We significantly enhance the chart VQA models, achieving the state-of-the-art accuracy on the ChartQA and PlotQA datasets.
arXiv Detail & Related papers (2024-03-25T03:02:27Z) - Auto-decoding Graphs [91.3755431537592]
The generative model is an auto-decoder that learns to synthesize graphs from latent codes.
Graphs are synthesized using self-attention modules that are trained to identify likely connectivity patterns.
arXiv Detail & Related papers (2020-06-04T14:23:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.