Related papers: Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations

Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations

URL: http://arxiv.org/abs/2409.18764v1
Date: Fri, 27 Sep 2024 14:02:48 GMT
Title: Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations
Authors: James Ford, Xingmeng Zhao, Dan Schumacher, Anthony Rios,
Abstract summary: We propose a novel framework that leverages Visual Question Answering (VQA) models to automate the evaluation of LLM-generated data visualizations. Our results indicate that LLM-generated charts do not match the accuracy of the original non-LLM-generated charts based on VQA performance measures.
Score: 7.32619928577074
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a novel framework that leverages Visual Question Answering (VQA) models to automate the evaluation of LLM-generated data visualizations. Traditional evaluation methods often rely on human judgment, which is costly and unscalable, or focus solely on data accuracy, neglecting the effectiveness of visual communication. By employing VQA models, we assess data representation quality and the general communicative clarity of charts. Experiments were conducted using two leading VQA benchmark datasets, ChartQA and PlotQA, with visualizations generated by OpenAI's GPT-3.5 Turbo and Meta's Llama 3.1 70B-Instruct models. Our results indicate that LLM-generated charts do not match the accuracy of the original non-LLM-generated charts based on VQA performance measures. Moreover, while our results demonstrate that few-shot prompting significantly boosts the accuracy of chart generation, considerable progress remains to be made before LLMs can fully match the precision of human-generated graphs. This underscores the importance of our work, which expedites the research process by enabling rapid iteration without the need for human annotation, thus accelerating advancements in this field.

Related papers

STORM: Benchmarking Visual Rating of MLLMs with a Comprehensive Ordinal Regression Dataset [13.574832958298911]
STORM is a data collection and benchmark for Stimulating Trustworthy Ordinal Regression Ability of MLLMs for universal visual rating.<n>We propose a coarse-to-fine processing pipeline that dynamically considers label candidates and provides interpretable thoughts.<n>This benchmark aims to evaluate the all-in-one and zero-shot performance of MLLMs in scenarios requiring understanding of the essential common ordinal relationships of rating labels.
arXiv Detail & Related papers (2025-06-02T14:48:15Z)
RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning [63.599057862999]
RefChartQA is a novel benchmark that integrates Chart Question Answering (ChartQA) with visual grounding. Our experiments demonstrate that incorporating spatial awareness via grounding improves response accuracy by over 15%.
arXiv Detail & Related papers (2025-03-29T15:50:08Z)
Towards Understanding Graphical Perception in Large Multimodal Models [80.44471730672801]
We leverage the theory of graphical perception to develop an evaluation framework for analyzing gaps in LMMs' perception abilities in charts. We apply our framework to evaluate and diagnose the perception capabilities of state-of-the-art LMMs at three levels (chart, visual element, and pixel)
arXiv Detail & Related papers (2025-03-13T20:13:39Z)
Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts [62.45232157149698]
We introduce a novel Chart Hypothetical Question Answering (HQA) task, which imposes assumptions on the same question to compel models to engage in counterfactual reasoning based on the chart content. Furthermore, we introduce HAI, a human-AI interactive data synthesis approach that leverages the efficient text-editing capabilities of MLLMs alongside human expert knowledge to generate diverse and high-quality HQA data at a low cost.
arXiv Detail & Related papers (2025-03-06T05:08:40Z)
Protecting multimodal large language models against misleading visualizations [94.71976205962527]
We show that questionanswering (QA) accuracy on misleading visualizations drops on average to the level of the random baseline.<n>We introduce the first inference-time methods to improve QA performance on misleading visualizations, without compromising accuracy on non-misleading ones.<n>We find that two methods, table-based QA and redrawing the visualization, are effective, with improvements of up to 19.6 percentage points.
arXiv Detail & Related papers (2025-02-27T20:22:34Z)
End-to-End Chart Summarization via Visual Chain-of-Thought in Vision-Language Models [0.0]
This paper introduces End-to-End Visual Chain-of-Thought (V-CoT) for chart summarization. Our method directly trains an LVLM to process chart images and generate textual summaries in an end-to-end fashion. We incorporate a visual Chain-of-Thought mechanism through instruction fine-tuning, implicitly guiding the LVLM to perform visual reasoning steps.
arXiv Detail & Related papers (2025-02-24T19:13:45Z)
RealCQA-V2 : Visual Premise Proving A Manual COT Dataset for Charts [2.9201864249313383]
We introduce Visual Premise Proving, a novel task tailored to refine the process of chart question answering. This approach represents a departure from conventional accuracy-based evaluation methods. A model adept at reasoning is expected to demonstrate proficiency in both data retrieval and the structural understanding of charts.
arXiv Detail & Related papers (2024-10-29T19:32:53Z)
LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content [62.816876067499415]
We propose LiveXiv: a scalable evolving live benchmark based on scientific ArXiv papers. LiveXiv accesses domain-specific manuscripts at any given timestamp and proposes to automatically generate visual question-answer pairs. We benchmark multiple open and proprietary Large Multi-modal Models (LMMs) on the first version of our benchmark, showing its challenging nature and exposing the models true abilities.
arXiv Detail & Related papers (2024-10-14T17:51:23Z)
Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning [1.6570772838074355]
multimodal large language models (MLLMs) exhibit great potential for chart question answering (CQA) Recent efforts primarily focus on scaling up training datasets through data collection and synthesis. We propose a visualization-referenced instruction tuning approach to guide the training dataset enhancement and model development.
arXiv Detail & Related papers (2024-07-29T17:04:34Z)
On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts. We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z)
FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding [52.35520385083425]
FlowLearn dataset is a resource tailored to enhance the understanding of flowcharts. The scientific subset contains 3,858 flowcharts sourced from scientific literature. The simulated subset contains 10,000 flowcharts created using a customizable script.
arXiv Detail & Related papers (2024-07-06T20:58:51Z)
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs [62.84082370758761]
CharXiv is a comprehensive evaluation suite involving 2,323 charts from arXiv papers. To ensure quality, all charts and questions are handpicked, curated, and verified by human experts. Results reveal a substantial, previously underestimated gap between the reasoning skills of the strongest proprietary model.
arXiv Detail & Related papers (2024-06-26T17:50:11Z)
Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA [9.659820850719413]
We leverage Large Language Models (LLMs), which have shown to have strong reasoning ability, as an automatic data annotator. Key innovation in our method lies in the Synthesize Step-by-Step strategy. We significantly enhance the chart VQA models, achieving the state-of-the-art accuracy on the ChartQA and PlotQA datasets.
arXiv Detail & Related papers (2024-03-25T03:02:27Z)
RealCQA: Scientific Chart Question Answering as a Test-bed for First-Order Logic [8.155575318208628]
We introduce a benchmark and dataset for chart visual QA on real-world charts. Our contribution includes the introduction of a new answer type, 'list', with both ranked and unranked variations. Results of our experiments, conducted on a real-world out-of-distribution dataset, provide a robust evaluation of large-scale pre-trained models.
arXiv Detail & Related papers (2023-08-03T18:21:38Z)
Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis [50.972595036856035]
We present a code that successfully replicates results from six popular and recent graph recommendation models. We compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations. By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure.
arXiv Detail & Related papers (2023-08-01T09:31:44Z)
Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs [71.55796212450055]
We introduce ChartT5, a V+L model that learns how to interpret table information from chart images via cross-modal pre-training on plot table pairs. Specifically, we propose two novel pre-training objectives: Masked Header Prediction (MHP) and Masked Value Prediction (MVP)
arXiv Detail & Related papers (2023-05-29T22:29:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.