Is this chart lying to me? Automating the detection of misleading visualizations
- URL: http://arxiv.org/abs/2508.21675v1
- Date: Fri, 29 Aug 2025 14:36:45 GMT
- Title: Is this chart lying to me? Automating the detection of misleading visualizations
- Authors: Jonathan Tonglet, Jan Zimny, Tinne Tuytelaars, Iryna Gurevych,
- Abstract summary: Misleading visualizations are a potent driver of misinformation on social media and the web.<n>We introduce Misviz, a benchmark of 2,604 real-world visualizations annotated with 12 types of misleaders.<n>We also release Misviz-synth, a synthetic dataset of 81,814 visualizations generated using Matplotlib and based on real-world data tables.
- Score: 74.26574031329689
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Misleading visualizations are a potent driver of misinformation on social media and the web. By violating chart design principles, they distort data and lead readers to draw inaccurate conclusions. Prior work has shown that both humans and multimodal large language models (MLLMs) are frequently deceived by such visualizations. Automatically detecting misleading visualizations and identifying the specific design rules they violate could help protect readers and reduce the spread of misinformation. However, the training and evaluation of AI models has been limited by the absence of large, diverse, and openly available datasets. In this work, we introduce Misviz, a benchmark of 2,604 real-world visualizations annotated with 12 types of misleaders. To support model training, we also release Misviz-synth, a synthetic dataset of 81,814 visualizations generated using Matplotlib and based on real-world data tables. We perform a comprehensive evaluation on both datasets using state-of-the-art MLLMs, rule-based systems, and fine-tuned classifiers. Our results reveal that the task remains highly challenging. We release Misviz, Misviz-synth, and the accompanying code.
Related papers
- ChartAttack: Testing the Vulnerability of LLMs to Malicious Prompting in Chart Generation [51.49421299447412]
Multimodal large language models (MLLMs) are increasingly used to automate chart generation from data tables.<n>We introduce ChartAttack, a framework for evaluating how MLLMs can be misused to generate misleading charts at scale.
arXiv Detail & Related papers (2026-01-19T11:57:48Z) - ChartAB: A Benchmark for Chart Grounding & Dense Alignment [17.16234793106]
We introduce a novel "ChartAlign Benchmark (ChartAB)" to provide a comprehensive evaluation of vision-language models (VLMs)<n>By incorporating a novel two-stage inference workflow, the benchmark can further evaluate VLMs capability to align and compare elements/attributes across two charts.<n>Our analysis of evaluations reveals new insights into their perception biases, weaknesses, robustness, and hallucinations in chart understanding.
arXiv Detail & Related papers (2025-10-30T17:56:31Z) - RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning [63.599057862999]
RefChartQA is a novel benchmark that integrates Chart Question Answering (ChartQA) with visual grounding.<n>Our experiments demonstrate that incorporating spatial awareness via grounding improves response accuracy by over 15%.
arXiv Detail & Related papers (2025-03-29T15:50:08Z) - Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering [28.54154468156412]
Misleading chart visualizations can distort perceptions and lead to incorrect conclusions.<n>Recent advances in large language models (MLLMs) have demonstrated strong chart comprehension capabilities.<n>This paper introduces the Misleading Chart Question Answering (Misleading ChartQA) Benchmark, a large-scale dataset designed to assess MLLMs in identifying and reasoning about misleading charts.
arXiv Detail & Related papers (2025-03-23T18:56:33Z) - Protecting multimodal large language models against misleading visualizations [94.71976205962527]
We show that questionanswering (QA) accuracy on misleading visualizations drops on average to the level of the random baseline.<n>We introduce the first inference-time methods to improve QA performance on misleading visualizations, without compromising accuracy on non-misleading ones.<n>We find that two methods, table-based QA and redrawing the visualization, are effective, with improvements of up to 19.6 percentage points.
arXiv Detail & Related papers (2025-02-27T20:22:34Z) - CHARTOM: A Visual Theory-of-Mind Benchmark for LLMs on Misleading Charts [26.477627174115806]
We introduce CHARTOM, a visual theory-of-mind benchmark designed to evaluate multimodal large language models' capability to understand and reason about misleading data visualizations though charts.<n> CHARTOM consists of carefully designed charts and associated questions that require a language model to not only correctly comprehend the factual content in the chart (the FACT question) but also judge whether the chart will be misleading to a human readers (the MIND question)<n>We detail the construction of our benchmark including its calibration on human performance and estimation of MIND ground truth called the Human Misleadingness Index.
arXiv Detail & Related papers (2024-08-26T17:04:23Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation.
We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.
Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.