De-rendering, Reasoning, and Repairing Charts with Vision-Language Models
- URL: http://arxiv.org/abs/2602.20291v1
- Date: Mon, 23 Feb 2026 19:16:27 GMT
- Title: De-rendering, Reasoning, and Repairing Charts with Vision-Language Models
- Authors: Valentin Bonas, Martin Sinnona, Viviana Siless, Emmanuel Iarussi,
- Abstract summary: Rule-based visualization linters can flag violations, but they miss context and do not suggest meaningful design changes.<n>We introduce a framework that combines chart de-rendering, automated analysis, and iterative improvement to deliver actionable, interpretable feedback.
- Score: 2.3332469289621787
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data visualizations are central to scientific communication, journalism, and everyday decision-making, yet they are frequently prone to errors that can distort interpretation or mislead audiences. Rule-based visualization linters can flag violations, but they miss context and do not suggest meaningful design changes. Directly querying general-purpose LLMs about visualization quality is unreliable: lacking training to follow visualization design principles, they often produce inconsistent or incorrect feedback. In this work, we introduce a framework that combines chart de-rendering, automated analysis, and iterative improvement to deliver actionable, interpretable feedback on visualization design. Our system reconstructs the structure of a chart from an image, identifies design flaws using vision-language reasoning, and proposes concrete modifications supported by established principles in visualization research. Users can selectively apply these improvements and re-render updated figures, creating a feedback loop that promotes both higher-quality visualizations and the development of visualization literacy. In our evaluation on 1,000 charts from the Chart2Code benchmark, the system generated 10,452 design recommendations, which clustered into 10 coherent categories (e.g., axis formatting, color accessibility, legend consistency). These results highlight the promise of LLM-driven recommendation systems for delivering structured, principle-based feedback on visualization design, opening the door to more intelligent and accessible authoring tools.
Related papers
- Do Large Language Models Understand Data Visualization Principles? [2.3332469289621787]
It remains unclear whether large language models (LLMs) and vision-language counterparts (VLMs) can reason about and enforce visualization principles directly.<n>We evaluate both checking and fixing tasks, assessing how well models detect principle violations and correct flawed chart specifications.<n>Our work highlights both the promise of large (vision-)language models as flexible validators and editors of visualization designs and the persistent gap with symbolic solvers on more nuanced aspects of visual perception.
arXiv Detail & Related papers (2026-02-23T17:51:06Z) - Visual Self-Refine: A Pixel-Guided Paradigm for Accurate Chart Parsing [76.2602505940467]
Existing models often struggle with visually dense charts, leading to errors like data omission, misalignment, and hallucination.<n>Inspired by the human strategy of using a finger as a visual anchor'' to ensure accuracy when reading complex charts, we propose a new paradigm named Visual Self-Refine (VSR)<n>The core idea of VSR is to enable a model to generate pixel-level localization outputs, visualize them, and then feed these visualizations back to itself, allowing it to intuitively inspect and correct its own potential visual perception errors.
arXiv Detail & Related papers (2026-02-18T13:40:53Z) - Hierarchical Process Reward Models are Symbolic Vision Learners [56.94353087007494]
Symbolic computer vision represents diagrams through explicit logical rules and structured representations, enabling interpretable understanding in machine vision.<n>This requires fundamentally different learning paradigms from pixel-based visual models.<n>We propose a novel self-supervised auto-encoder that encodes diagrams into primitives and decodes them through our executable engine to reconstruct input diagrams.
arXiv Detail & Related papers (2025-12-02T18:46:40Z) - The Perils of Chart Deception: How Misleading Visualizations Affect Vision-Language Models [11.500090488046899]
Vision-Language Models (VLMs) are increasingly used to interpret visualizations, especially by non-expert users.<n>This study analyzes over 16,000 responses from ten different models across eight distinct types of misleading chart designs.<n>Our findings highlight the need for robust safeguards in VLMs against visual misinformation.
arXiv Detail & Related papers (2025-08-13T11:11:18Z) - ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding [18.67532755744138]
Automated chart understanding poses significant challenges to existing multimodal large language models.<n>Current step-by-step reasoning models primarily focus on text-based logical reasoning for chart understanding.<n>We propose ChartSketcher, a multimodal feedback-driven step-by-step reasoning method designed to address these limitations.
arXiv Detail & Related papers (2025-05-25T10:21:29Z) - End-to-End Vision Tokenizer Tuning [73.3065542220568]
The vision tokenizer optimized for low-level reconstruction is to downstream tasks requiring varied representations and semantics.<n>The loss of the vision tokenization can be the representation bottleneck for target tasks.<n>We propose ETT, an end-to-end vision tokenizer tuning approach that enables joint optimization between vision tokenization and target autoregressive tasks.
arXiv Detail & Related papers (2025-05-15T17:59:39Z) - RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning [63.599057862999]
RefChartQA is a novel benchmark that integrates Chart Question Answering (ChartQA) with visual grounding.<n>Our experiments demonstrate that incorporating spatial awareness via grounding improves response accuracy by over 15%.
arXiv Detail & Related papers (2025-03-29T15:50:08Z) - Towards Understanding Graphical Perception in Large Multimodal Models [80.44471730672801]
We leverage the theory of graphical perception to develop an evaluation framework for analyzing gaps in LMMs' perception abilities in charts.<n>We apply our framework to evaluate and diagnose the perception capabilities of state-of-the-art LMMs at three levels (chart, visual element, and pixel)
arXiv Detail & Related papers (2025-03-13T20:13:39Z) - Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement [102.22911097049953]
Large vision-language models (LVLMs) have achieved impressive results in visual question-answering and reasoning tasks.<n>Existing methods often depend on external models or data, leading to uncontrollable and unstable alignment results.<n>We propose SIMA, a self-improvement framework that enhances visual and language modality alignment without external dependencies.
arXiv Detail & Related papers (2024-05-24T23:09:27Z) - Calibrated Self-Rewarding Vision Language Models [27.686545023186852]
Large Vision-Language Models (LVLMs) have made substantial progress by integrating pre-trained large language models (LLMs) and vision models through instruction tuning.
LVLMs often exhibit the hallucination phenomenon, where generated text responses appear linguistically plausible but contradict the input image.
We propose the Calibrated Self-Rewarding (CSR) approach, which enables the model to self-improve by iteratively generating candidate responses, evaluating the reward for each response, and curating preference data for fine-tuning.
arXiv Detail & Related papers (2024-05-23T14:30:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.