Evaluating Task-based Effectiveness of MLLMs on Charts
- URL: http://arxiv.org/abs/2405.07001v2
- Date: Mon, 17 Jun 2024 15:44:33 GMT
- Title: Evaluating Task-based Effectiveness of MLLMs on Charts
- Authors: Yifan Wu, Lutao Yan, Yuyu Luo, Yunhai Wang, Nan Tang,
- Abstract summary: We first curate a large-scale dataset, named ChartInsights, consisting of 89,388 quartets (chart, task, question, answer) and covering 10 widely-used low-level data analysis tasks on 7 chart types.
To understand the limitations of multimodal large models in low-level data analysis tasks, we have designed various experiments to conduct an in-depth test of capabilities of GPT-4V.
These findings suggest potential of GPT-4V to revolutionize interaction with charts and uncover the gap between human analytic needs and capabilities of GPT-4V.
- Score: 28.11539421235211
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we explore a forward-thinking question: Is GPT-4V effective at low-level data analysis tasks on charts? To this end, we first curate a large-scale dataset, named ChartInsights, consisting of 89,388 quartets (chart, task, question, answer) and covering 10 widely-used low-level data analysis tasks on 7 chart types. Firstly, we conduct systematic evaluations to understand the capabilities and limitations of 18 advanced MLLMs, which include 12 open-source models and 6 closed-source models. Starting with a standard textual prompt approach, the average accuracy rate across the 18 MLLMs is 36.17%. Among all the models, GPT-4V achieves the highest accuracy, reaching 56.13%. To understand the limitations of multimodal large models in low-level data analysis tasks, we have designed various experiments to conduct an in-depth test of capabilities of GPT-4V. We further investigate how visual modifications to charts, such as altering visual elements (e.g. changing color schemes) and introducing perturbations (e.g. adding image noise), affect performance of GPT-4V. Secondly, we present 12 experimental findings. These findings suggest potential of GPT-4V to revolutionize interaction with charts and uncover the gap between human analytic needs and capabilities of GPT-4V. Thirdly, we propose a novel textual prompt strategy, named Chain-of-Charts, tailored for low-level analysis tasks, which boosts model performance by 24.36%, resulting in an accuracy of 80.49%. Furthermore, by incorporating a visual prompt strategy that directs attention of GPT-4V to question-relevant visual elements, we further improve accuracy to 83.83%. Our study not only sheds light on the capabilities and limitations of GPT-4V in low-level data analysis tasks but also offers valuable insights for future research.
Related papers
- ChartLens: Fine-grained Visual Attribution in Charts [106.44872805609673]
Post-Hoc Visual Attribution for Charts identifies fine-grained chart elements that validate a given chart-associated response.<n>We propose ChartLens, a novel chart attribution algorithm that uses segmentation-based techniques to identify chart objects.<n>Our evaluations show that ChartLens improves fine-grained attributions by 26-66%.
arXiv Detail & Related papers (2025-05-25T23:17:32Z) - ChartCards: A Chart-Metadata Generation Framework for Multi-Task Chart Understanding [18.857927344450932]
We propose ChartCards, a unified chart-metadata generation framework for multi-task chart understanding.<n>Using ChartCards, we construct MetaChart, a large-scale high-quality dataset containing 10,862 data tables, 85K charts, and 170 K high-quality chart captions.<n>Fine-tuning six different models on MetaChart resulted in an average performance improvement of 5% across all tasks.
arXiv Detail & Related papers (2025-05-21T03:07:47Z) - Socratic Chart: Cooperating Multiple Agents for Robust SVG Chart Understanding [14.75820681491341]
Existing benchmarks reveal reliance on text-based shortcuts and probabilistic pattern-matching rather than genuine visual reasoning.
We propose Socratic Chart, a new framework that transforms chart images into Scalable Vector Graphics representations.
Our framework surpasses state-of-the-art models in accurately capturing chart primitives and improving reasoning performance.
arXiv Detail & Related papers (2025-04-14T00:07:39Z) - RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning [63.599057862999]
RefChartQA is a novel benchmark that integrates Chart Question Answering (ChartQA) with visual grounding.
Our experiments demonstrate that incorporating spatial awareness via grounding improves response accuracy by over 15%.
arXiv Detail & Related papers (2025-03-29T15:50:08Z) - Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations [7.32619928577074]
We propose a novel framework that leverages Visual Question Answering (VQA) models to automate the evaluation of LLM-generated data visualizations.
Our results indicate that LLM-generated charts do not match the accuracy of the original non-LLM-generated charts based on VQA performance measures.
arXiv Detail & Related papers (2024-09-27T14:02:48Z) - SynChart: Synthesizing Charts from Language Models [50.73888371511983]
This work explores the potential of using LLMs alone for data generation and develop competitive multi-modality models focusing on chart understanding.
We construct a large-scale chart dataset, SynChart, which contains approximately 4 million diverse chart images with over 75 million dense annotations.
We trained a 4.2B chart-expert model using this dataset and achieve near-GPT-4O performance on the ChartQA task, surpassing GPT-4V.
arXiv Detail & Related papers (2024-09-25T00:18:12Z) - CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs [62.84082370758761]
CharXiv is a comprehensive evaluation suite involving 2,323 charts from arXiv papers.
To ensure quality, all charts and questions are handpicked, curated, and verified by human experts.
Results reveal a substantial, previously underestimated gap between the reasoning skills of the strongest proprietary model.
arXiv Detail & Related papers (2024-06-26T17:50:11Z) - TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning [83.58521787193293]
We present TinyChart, an efficient MLLM for chart understanding with only 3B parameters.
TinyChart overcomes two key challenges in efficient chart understanding: (1) reduce the burden of learning numerical computations through a Program-of-Thoughts (PoT) learning strategy, and (2) reduce lengthy vision feature sequences produced by the vision transformer for high-resolution images through a Vision Token Merging module.
arXiv Detail & Related papers (2024-04-25T14:23:24Z) - ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning [54.82612435284695]
We benchmark the ability of off-the-shelf Multi-modal Large Language Models (MLLMs) in the chart domain.
We construct ChartX, a multi-modal evaluation set covering 18 chart types, 7 chart tasks, 22 disciplinary topics, and high-quality chart data.
We develop ChartVLM to offer a new perspective on handling multi-modal tasks that strongly depend on interpretable patterns.
arXiv Detail & Related papers (2024-02-19T14:48:23Z) - ChartBench: A Benchmark for Complex Visual Reasoning in Charts [36.492851648081405]
Multimodal Large Language Models (MLLMs) have shown impressive capabilities in image understanding and generation.
Current benchmarks fail to accurately evaluate the chart comprehension of MLLMs due to limited chart types and inappropriate metrics.
We propose ChartBench, a comprehensive benchmark designed to assess chart comprehension and data reliability through complex visual reasoning.
arXiv Detail & Related papers (2023-12-26T07:20:55Z) - Do LLMs Work on Charts? Designing Few-Shot Prompts for Chart Question
Answering and Summarization [27.913656283822483]
Large language models (LLMs) have shown impressive generalization capabilities to unseen tasks.
We propose PromptChart, a multimodal few-shot prompting framework with LLMs for chart-related applications.
Our experiments on three different chart-related information consumption tasks show that with properly designed prompts LLMs can excel on the benchmarks.
arXiv Detail & Related papers (2023-12-17T05:13:58Z) - Enhanced Chart Understanding in Vision and Language Task via Cross-modal
Pre-training on Plot Table Pairs [71.55796212450055]
We introduce ChartT5, a V+L model that learns how to interpret table information from chart images via cross-modal pre-training on plot table pairs.
Specifically, we propose two novel pre-training objectives: Masked Header Prediction (MHP) and Masked Value Prediction (MVP)
arXiv Detail & Related papers (2023-05-29T22:29:03Z) - Investigating Pretrained Language Models for Graph-to-Text Generation [55.55151069694146]
Graph-to-text generation aims to generate fluent texts from graph-based data.
We present a study across three graph domains: meaning representations, Wikipedia knowledge graphs (KGs) and scientific KGs.
We show that the PLMs BART and T5 achieve new state-of-the-art results and that task-adaptive pretraining strategies improve their performance even further.
arXiv Detail & Related papers (2020-07-16T16:05:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.