The Perils of Chart Deception: How Misleading Visualizations Affect Vision-Language Models
- URL: http://arxiv.org/abs/2508.09716v1
- Date: Wed, 13 Aug 2025 11:11:18 GMT
- Title: The Perils of Chart Deception: How Misleading Visualizations Affect Vision-Language Models
- Authors: Ridwan Mahbub, Mohammed Saidul Islam, Md Tahmid Rahman Laskar, Mizanur Rahman, Mir Tafseer Nayeem, Enamul Hoque,
- Abstract summary: Vision-Language Models (VLMs) are increasingly used to interpret visualizations, especially by non-expert users.<n>This study analyzes over 16,000 responses from ten different models across eight distinct types of misleading chart designs.<n>Our findings highlight the need for robust safeguards in VLMs against visual misinformation.
- Score: 11.500090488046899
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Information visualizations are powerful tools that help users quickly identify patterns, trends, and outliers, facilitating informed decision-making. However, when visualizations incorporate deceptive design elements-such as truncated or inverted axes, unjustified 3D effects, or violations of best practices-they can mislead viewers and distort understanding, spreading misinformation. While some deceptive tactics are obvious, others subtly manipulate perception while maintaining a facade of legitimacy. As Vision-Language Models (VLMs) are increasingly used to interpret visualizations, especially by non-expert users, it is critical to understand how susceptible these models are to deceptive visual designs. In this study, we conduct an in-depth evaluation of VLMs' ability to interpret misleading visualizations. By analyzing over 16,000 responses from ten different models across eight distinct types of misleading chart designs, we demonstrate that most VLMs are deceived by them. This leads to altered interpretations of charts, despite the underlying data remaining the same. Our findings highlight the need for robust safeguards in VLMs against visual misinformation.
Related papers
- Scale Can't Overcome Pragmatics: The Impact of Reporting Bias on Vision-Language Reasoning [79.95774256444956]
The lack of reasoning capabilities in Vision-Language Models has remained at the forefront of research discourse.<n>We investigate the data underlying the popular VLMs OpenCLIP, LLaVA-1.5 and Molmo through the lens of theories from pragmatics.
arXiv Detail & Related papers (2026-02-26T18:54:06Z) - Selective Training for Large Vision Language Models via Visual Information Gain [7.834991119179473]
We introduce Visual Information Gain (VIG), a perplexity-based metric.<n>VIG measures the reduction in prediction uncertainty provided by visual input.<n>We propose a VIG-guided selective training scheme that prioritizes high-VIG samples and tokens.
arXiv Detail & Related papers (2026-02-19T09:12:21Z) - MentisOculi: Revealing the Limits of Reasoning with Mental Imagery [63.285794947638614]
We develop MentisOculi, a suite of multi-step reasoning problems amenable to visual solution.<n> evaluating visual strategies ranging from latent tokens to explicit generated imagery, we find they generally fail to improve performance.<n>Our findings suggest that despite their inherent appeal, visual thoughts do not yet benefit model reasoning.
arXiv Detail & Related papers (2026-02-02T18:49:06Z) - Is this chart lying to me? Automating the detection of misleading visualizations [74.26574031329689]
Misleading visualizations are a potent driver of misinformation on social media and the web.<n>We introduce Misviz, a benchmark of 2,604 real-world visualizations annotated with 12 types of misleaders.<n>We also release Misviz-synth, a synthetic dataset of 81,814 visualizations generated using Matplotlib and based on real-world data tables.
arXiv Detail & Related papers (2025-08-29T14:36:45Z) - ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs [98.27348724529257]
We introduce ViCrit (Visual Caption Hallucination Critic), an RL proxy task that trains VLMs to localize a subtle, synthetic visual hallucination injected into paragraphs of human-written image captions.<n>Models trained with the ViCrit Task exhibit substantial gains across a variety of vision-language models benchmarks.
arXiv Detail & Related papers (2025-06-11T19:16:54Z) - SECOND: Mitigating Perceptual Hallucination in Vision-Language Models via Selective and Contrastive Decoding [5.976839106353883]
SECOND: Selective and Contrastive Decoding is a novel approach that enables Vision-Language Models to leverage multi-scale visual information with an object-centric manner.<n> SECOND significantly reduces perceptual hallucinations and outperforms a wide range of benchmarks.
arXiv Detail & Related papers (2025-06-10T02:55:38Z) - Seeing Through Deception: Uncovering Misleading Creator Intent in Multimodal News with Vision-Language Models [65.23999399834638]
We introduce DeceptionDecoded, a benchmark of 12,000 image-caption pairs grounded in trustworthy reference articles.<n>The dataset captures both misleading and non-misleading cases, spanning manipulations across visual and textual modalities.<n>It supports three intent-centric tasks: misleading intent detection, misleading source attribution, and creator desire inference.
arXiv Detail & Related papers (2025-05-21T13:14:32Z) - Aligning Attention Distribution to Information Flow for Hallucination Mitigation in Large Vision-Language Models [11.385588803559733]
We enhance the model's visual understanding by leveraging the core information embedded in semantic representations.<n>We evaluate our method on three image captioning benchmarks using five different LVLMs, demonstrating its effectiveness in significantly reducing hallucinations.
arXiv Detail & Related papers (2025-05-20T12:10:13Z) - On the Perception Bottleneck of VLMs for Chart Understanding [17.70892579781301]
Chart understanding requires models to analyze and reason about numerical data, textual elements, and complex visual components.<n>Our observations reveal that the perception capabilities of existing large vision-language models (LVLMs) constitute a critical bottleneck in this process.<n>In this study, we delve into this perception bottleneck by decomposing it into two components: the vision encoder bottleneck, and the extraction bottleneck.
arXiv Detail & Related papers (2025-03-24T08:33:58Z) - Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering [45.67334913593117]
Misleading visualizations pose risks to public understanding and raise safety concerns for AI systems involved in data-driven communication.<n>We benchmark 24 state-of-the-art MLLMs, analyze their performance across misleader types and chart formats, and propose a novel region-aware reasoning pipeline.<n>Our work lays the foundation for developing MLLMs that are robust, trustworthy, and aligned with the demands of responsible visual communication.
arXiv Detail & Related papers (2025-03-23T18:56:33Z) - Towards Understanding Graphical Perception in Large Multimodal Models [80.44471730672801]
We leverage the theory of graphical perception to develop an evaluation framework for analyzing gaps in LMMs' perception abilities in charts.<n>We apply our framework to evaluate and diagnose the perception capabilities of state-of-the-art LMMs at three levels (chart, visual element, and pixel)
arXiv Detail & Related papers (2025-03-13T20:13:39Z) - Protecting multimodal large language models against misleading visualizations [94.71976205962527]
We show that questionanswering (QA) accuracy on misleading visualizations drops on average to the level of the random baseline.<n>We introduce the first inference-time methods to improve QA performance on misleading visualizations, without compromising accuracy on non-misleading ones.<n>We find that two methods, table-based QA and redrawing the visualization, are effective, with improvements of up to 19.6 percentage points.
arXiv Detail & Related papers (2025-02-27T20:22:34Z) - Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge [24.538839144639653]
Large Vision-Language Models (LVLMs) integrate separately pre-trained vision and language components.
These models frequently encounter a core issue of "cognitive misalignment" between the vision encoder (VE) and the large language model (LLM)
arXiv Detail & Related papers (2024-11-25T18:33:14Z) - Visually Descriptive Language Model for Vector Graphics Reasoning [76.42082386029206]
We propose the Visually Descriptive Language Model (VDLM) to bridge the gap between low-level visual perception and high-level language reasoning.<n>We show that VDLM significantly improves state-of-the-art LMMs like GPT-4o on various multimodal perception and reasoning tasks.
arXiv Detail & Related papers (2024-04-09T17:30:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.