ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering
- URL: http://arxiv.org/abs/2405.07001v4
- Date: Wed, 06 Nov 2024 13:56:28 GMT
- Title: ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering
- Authors: Yifan Wu, Lutao Yan, Leixian Shen, Yunhai Wang, Nan Tang, Yuyu Luo,
- Abstract summary: multimodal large language models (MLLMs) have shown promise in high-level ChartQA tasks, but their effectiveness in low-level ChartQA tasks remains underexplored.
In this paper, we evaluate MLLMs on low-level ChartQA using a newly curated dataset, ChartInsights.
We propose a new textual prompt strategy, Chain-of-Charts, tailored for low-level ChartQA tasks, which boosts performance by 14.41%, achieving an accuracy of 83.58%.
- Score: 27.193293027128558
- License:
- Abstract: Chart question answering (ChartQA) tasks play a critical role in interpreting and extracting insights from visualization charts. While recent advancements in multimodal large language models (MLLMs) like GPT-4o have shown promise in high-level ChartQA tasks, such as chart captioning, their effectiveness in low-level ChartQA tasks (e.g., identifying correlations) remains underexplored. In this paper, we address this gap by evaluating MLLMs on low-level ChartQA using a newly curated dataset, ChartInsights, which consists of 22,347 (chart, task, query, answer) covering 10 data analysis tasks across 7 chart types. We systematically evaluate 19 advanced MLLMs, including 12 open-source and 7 closed-source models. The average accuracy rate across these models is 39.8%, with GPT-4o achieving the highest accuracy at 69.17%. To further explore the limitations of MLLMs in low-level ChartQA, we conduct experiments that alter visual elements of charts (e.g., changing color schemes, adding image noise) to assess their impact on the task effectiveness. Furthermore, we propose a new textual prompt strategy, Chain-of-Charts, tailored for low-level ChartQA tasks, which boosts performance by 14.41%, achieving an accuracy of 83.58%. Finally, incorporating a visual prompt strategy that directs attention to relevant visual elements further improves accuracy to 84.32%.
Related papers
- Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations [7.32619928577074]
We propose a novel framework that leverages Visual Question Answering (VQA) models to automate the evaluation of LLM-generated data visualizations.
Our results indicate that LLM-generated charts do not match the accuracy of the original non-LLM-generated charts based on VQA performance measures.
arXiv Detail & Related papers (2024-09-27T14:02:48Z) - SynChart: Synthesizing Charts from Language Models [50.73888371511983]
This work explores the potential of using LLMs alone for data generation and develop competitive multi-modality models focusing on chart understanding.
We construct a large-scale chart dataset, SynChart, which contains approximately 4 million diverse chart images with over 75 million dense annotations.
We trained a 4.2B chart-expert model using this dataset and achieve near-GPT-4O performance on the ChartQA task, surpassing GPT-4V.
arXiv Detail & Related papers (2024-09-25T00:18:12Z) - CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs [62.84082370758761]
CharXiv is a comprehensive evaluation suite involving 2,323 charts from arXiv papers.
To ensure quality, all charts and questions are handpicked, curated, and verified by human experts.
Results reveal a substantial, previously underestimated gap between the reasoning skills of the strongest proprietary model.
arXiv Detail & Related papers (2024-06-26T17:50:11Z) - TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning [83.58521787193293]
We present TinyChart, an efficient MLLM for chart understanding with only 3B parameters.
TinyChart overcomes two key challenges in efficient chart understanding: (1) reduce the burden of learning numerical computations through a Program-of-Thoughts (PoT) learning strategy, and (2) reduce lengthy vision feature sequences produced by the vision transformer for high-resolution images through a Vision Token Merging module.
arXiv Detail & Related papers (2024-04-25T14:23:24Z) - ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning [54.82612435284695]
We benchmark the ability of off-the-shelf Multi-modal Large Language Models (MLLMs) in the chart domain.
We construct ChartX, a multi-modal evaluation set covering 18 chart types, 7 chart tasks, 22 disciplinary topics, and high-quality chart data.
We develop ChartVLM to offer a new perspective on handling multi-modal tasks that strongly depend on interpretable patterns.
arXiv Detail & Related papers (2024-02-19T14:48:23Z) - ChartBench: A Benchmark for Complex Visual Reasoning in Charts [36.492851648081405]
Multimodal Large Language Models (MLLMs) have shown impressive capabilities in image understanding and generation.
Current benchmarks fail to accurately evaluate the chart comprehension of MLLMs due to limited chart types and inappropriate metrics.
We propose ChartBench, a comprehensive benchmark designed to assess chart comprehension and data reliability through complex visual reasoning.
arXiv Detail & Related papers (2023-12-26T07:20:55Z) - Do LLMs Work on Charts? Designing Few-Shot Prompts for Chart Question
Answering and Summarization [27.913656283822483]
Large language models (LLMs) have shown impressive generalization capabilities to unseen tasks.
We propose PromptChart, a multimodal few-shot prompting framework with LLMs for chart-related applications.
Our experiments on three different chart-related information consumption tasks show that with properly designed prompts LLMs can excel on the benchmarks.
arXiv Detail & Related papers (2023-12-17T05:13:58Z) - Enhanced Chart Understanding in Vision and Language Task via Cross-modal
Pre-training on Plot Table Pairs [71.55796212450055]
We introduce ChartT5, a V+L model that learns how to interpret table information from chart images via cross-modal pre-training on plot table pairs.
Specifically, we propose two novel pre-training objectives: Masked Header Prediction (MHP) and Masked Value Prediction (MVP)
arXiv Detail & Related papers (2023-05-29T22:29:03Z) - Investigating Pretrained Language Models for Graph-to-Text Generation [55.55151069694146]
Graph-to-text generation aims to generate fluent texts from graph-based data.
We present a study across three graph domains: meaning representations, Wikipedia knowledge graphs (KGs) and scientific KGs.
We show that the PLMs BART and T5 achieve new state-of-the-art results and that task-adaptive pretraining strategies improve their performance even further.
arXiv Detail & Related papers (2020-07-16T16:05:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.