Related papers: ChartM$^3$: Benchmarking Chart Editing with Multimodal Instructions

ChartM$^3$: Benchmarking Chart Editing with Multimodal Instructions

URL: http://arxiv.org/abs/2507.21167v3
Date: Wed, 06 Aug 2025 14:05:00 GMT
Title: ChartM$^3$: Benchmarking Chart Editing with Multimodal Instructions
Authors: Donglu Yang, Liang Zhang, Zihao Yue, Liangyu Chen, Yichen Xu, Wenxuan Wang, Qin Jin,
Abstract summary: We introduce a novel paradigm for multimodal chart editing, where user intent is expressed through a combination of natural language and visual indicators.<n>We present Chart$textM3$, a new benchmark for Multimodal chart editing with Multi-level complexity and Multi-perspective evaluation.
Score: 65.21061221740388
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Charts are a fundamental visualization format widely used in data analysis across research and industry. While enabling users to edit charts based on high-level intentions is of great practical value, existing methods primarily rely on natural language instructions, which are often too ambiguous to support fine-grained editing. In this work, we introduce a novel paradigm for multimodal chart editing, where user intent is expressed through a combination of natural language and visual indicators that explicitly highlight the elements to be modified. To support this paradigm, we present Chart$\text{M}^3$, a new benchmark for Multimodal chart editing with Multi-level complexity and Multi-perspective evaluation. Chart$\text{M}^3$ contains 1,000 samples spanning four levels of editing difficulty. Each sample includes triplets in the form of (chart, code, multimodal instructions). To comprehensively evaluate chart editing models, Chart$\text{M}^3$ provides metrics that assess both visual appearance and code correctness. Our benchmark reveals significant limitations in current multimodal large language models (MLLMs), including GPT-4o, particularly in their ability to interpret and act on visual indicators. To address this, we construct Chart$\text{M}^3$-Train, a large-scale training set with 24,000 multimodal chart editing samples. Fine-tuning MLLMs on this dataset leads to substantial improvements, demonstrating the importance of multimodal supervision in building practical chart editing systems. Our datasets, codes, and evaluation tools are available at https://github.com/MLrollIT/ChartM3. %https://github.com/MLrollIT/ChartM3Our datasets, codes, and evaluation tools are available at https://github.com/yaolinli/VCE.

Related papers

ChartLens: Fine-grained Visual Attribution in Charts [106.44872805609673]
Post-Hoc Visual Attribution for Charts identifies fine-grained chart elements that validate a given chart-associated response.<n>We propose ChartLens, a novel chart attribution algorithm that uses segmentation-based techniques to identify chart objects.<n>Our evaluations show that ChartLens improves fine-grained attributions by 26-66%.
arXiv Detail & Related papers (2025-05-25T23:17:32Z)
ChartEdit: How Far Are MLLMs From Automating Chart Analysis? Evaluating MLLMs' Capability via Chart Editing [6.671042213908933]
multimodal large language models (MLLMs) show promise in generating chart rendering code, but chart editing presents a greater challenge.<n>We propose ChartEdit, a new high-quality benchmark designed for chart editing tasks.<n>We evaluate the performance of 10 mainstream MLLMs across two types of experiments, assessing them at both the code and chart levels.
arXiv Detail & Related papers (2025-05-17T09:47:15Z)
Socratic Chart: Cooperating Multiple Agents for Robust SVG Chart Understanding [14.75820681491341]
Existing benchmarks reveal reliance on text-based shortcuts and probabilistic pattern-matching rather than genuine visual reasoning.<n>We propose Socratic Chart, a new framework that transforms chart images into Scalable Vector Graphics representations.<n>Our framework surpasses state-of-the-art models in accurately capturing chart primitives and improving reasoning performance.
arXiv Detail & Related papers (2025-04-14T00:07:39Z)
On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts.<n>We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z)
TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning [83.58521787193293]
We present TinyChart, an efficient MLLM for chart understanding with only 3B parameters. TinyChart overcomes two key challenges in efficient chart understanding: (1) reduce the burden of learning numerical computations through a Program-of-Thoughts (PoT) learning strategy, and (2) reduce lengthy vision feature sequences produced by the vision transformer for high-resolution images through a Vision Token Merging module.
arXiv Detail & Related papers (2024-04-25T14:23:24Z)
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning [55.22996841790139]
We benchmark the ability of off-the-shelf Multi-modal Large Language Models (MLLMs) in the chart domain.<n>We construct ChartX, a multi-modal evaluation set covering 18 chart types, 7 chart tasks, 22 disciplinary topics, and high-quality chart data.<n>We develop ChartVLM to offer a new perspective on handling multi-modal tasks that strongly depend on interpretable patterns.
arXiv Detail & Related papers (2024-02-19T14:48:23Z)
ChartBench: A Benchmark for Complex Visual Reasoning in Charts [36.492851648081405]
Multimodal Large Language Models (MLLMs) have shown impressive capabilities in image understanding and generation. Current benchmarks fail to accurately evaluate the chart comprehension of MLLMs due to limited chart types and inappropriate metrics. We propose ChartBench, a comprehensive benchmark designed to assess chart comprehension and data reliability through complex visual reasoning.
arXiv Detail & Related papers (2023-12-26T07:20:55Z)
ChartLlama: A Multimodal LLM for Chart Understanding and Generation [70.1393163657813]
We create a high-quality instruction-tuning dataset leveraging GPT-4. Next, we introduce ChartLlama, a multi-modal large language model that we've trained using our created dataset.
arXiv Detail & Related papers (2023-11-27T15:20:23Z)
MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning [48.63002688222462]
A gap remains in the domain of chart image understanding due to the distinct abstract components in charts. We introduce a large-scale MultiModal Chart Instruction dataset comprising 600k instances supporting diverse tasks and chart types. We develop MultiModal Chart Assistant (textbfMMC-A), an LMM that achieves state-of-the-art performance on existing chart QA benchmarks.
arXiv Detail & Related papers (2023-11-15T23:36:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.