ChartAttack: Testing the Vulnerability of LLMs to Malicious Prompting in Chart Generation
- URL: http://arxiv.org/abs/2601.12983v1
- Date: Mon, 19 Jan 2026 11:57:48 GMT
- Title: ChartAttack: Testing the Vulnerability of LLMs to Malicious Prompting in Chart Generation
- Authors: Jesus-German Ortiz-Barajas, Jonathan Tonglet, Vivek Gupta, Iryna Gurevych,
- Abstract summary: Multimodal large language models (MLLMs) are increasingly used to automate chart generation from data tables.<n>We introduce ChartAttack, a framework for evaluating how MLLMs can be misused to generate misleading charts at scale.
- Score: 51.49421299447412
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal large language models (MLLMs) are increasingly used to automate chart generation from data tables, enabling efficient data analysis and reporting but also introducing new misuse risks. In this work, we introduce ChartAttack, a novel framework for evaluating how MLLMs can be misused to generate misleading charts at scale. ChartAttack injects misleaders into chart designs, aiming to induce incorrect interpretations of the underlying data. Furthermore, we create AttackViz, a chart question-answering (QA) dataset where each (chart specification, QA) pair is labeled with effective misleaders and their induced incorrect answers. Experiments in in-domain and cross-domain settings show that ChartAttack significantly degrades the QA performance of MLLM readers, reducing accuracy by an average of 19.6 points and 14.9 points, respectively. A human study further shows an average 20.2 point drop in accuracy for participants exposed to misleading charts generated by ChartAttack. Our findings highlight an urgent need for robustness and security considerations in the design, evaluation, and deployment of MLLM-based chart generation systems. We make our code and data publicly available.
Related papers
- Is this chart lying to me? Automating the detection of misleading visualizations [74.26574031329689]
Misleading visualizations are a potent driver of misinformation on social media and the web.<n>We introduce Misviz, a benchmark of 2,604 real-world visualizations annotated with 12 types of misleaders.<n>We also release Misviz-synth, a synthetic dataset of 81,814 visualizations generated using Matplotlib and based on real-world data tables.
arXiv Detail & Related papers (2025-08-29T14:36:45Z) - ChartLens: Fine-grained Visual Attribution in Charts [106.44872805609673]
Post-Hoc Visual Attribution for Charts identifies fine-grained chart elements that validate a given chart-associated response.<n>We propose ChartLens, a novel chart attribution algorithm that uses segmentation-based techniques to identify chart objects.<n>Our evaluations show that ChartLens improves fine-grained attributions by 26-66%.
arXiv Detail & Related papers (2025-05-25T23:17:32Z) - Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering [45.67334913593117]
Misleading visualizations pose risks to public understanding and raise safety concerns for AI systems involved in data-driven communication.<n>We benchmark 24 state-of-the-art MLLMs, analyze their performance across misleader types and chart formats, and propose a novel region-aware reasoning pipeline.<n>Our work lays the foundation for developing MLLMs that are robust, trustworthy, and aligned with the demands of responsible visual communication.
arXiv Detail & Related papers (2025-03-23T18:56:33Z) - Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts [62.45232157149698]
We introduce a novel Chart Hypothetical Question Answering (HQA) task, which imposes assumptions on the same question to compel models to engage in counterfactual reasoning based on the chart content.<n> Furthermore, we introduce HAI, a human-AI interactive data synthesis approach that leverages the efficient text-editing capabilities of MLLMs alongside human expert knowledge to generate diverse and high-quality HQA data at a low cost.
arXiv Detail & Related papers (2025-03-06T05:08:40Z) - Protecting multimodal large language models against misleading visualizations [94.71976205962527]
We show that questionanswering (QA) accuracy on misleading visualizations drops on average to the level of the random baseline.<n>We introduce the first inference-time methods to improve QA performance on misleading visualizations, without compromising accuracy on non-misleading ones.<n>We find that two methods, table-based QA and redrawing the visualization, are effective, with improvements of up to 19.6 percentage points.
arXiv Detail & Related papers (2025-02-27T20:22:34Z) - Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations [7.32619928577074]
We propose a novel framework that leverages Visual Question Answering (VQA) models to automate the evaluation of LLM-generated data visualizations.
Our results indicate that LLM-generated charts do not match the accuracy of the original non-LLM-generated charts based on VQA performance measures.
arXiv Detail & Related papers (2024-09-27T14:02:48Z) - ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering [27.193293027128558]
multimodal large language models (MLLMs) have shown promise in high-level ChartQA tasks, but their effectiveness in low-level ChartQA tasks remains underexplored.
In this paper, we evaluate MLLMs on low-level ChartQA using a newly curated dataset, ChartInsights.
We propose a new textual prompt strategy, Chain-of-Charts, tailored for low-level ChartQA tasks, which boosts performance by 14.41%, achieving an accuracy of 83.58%.
arXiv Detail & Related papers (2024-05-11T12:33:46Z) - ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning [55.22996841790139]
We benchmark the ability of off-the-shelf Multi-modal Large Language Models (MLLMs) in the chart domain.<n>We construct ChartX, a multi-modal evaluation set covering 18 chart types, 7 chart tasks, 22 disciplinary topics, and high-quality chart data.<n>We develop ChartVLM to offer a new perspective on handling multi-modal tasks that strongly depend on interpretable patterns.
arXiv Detail & Related papers (2024-02-19T14:48:23Z) - ChartBench: A Benchmark for Complex Visual Reasoning in Charts [36.492851648081405]
Multimodal Large Language Models (MLLMs) have shown impressive capabilities in image understanding and generation.
Current benchmarks fail to accurately evaluate the chart comprehension of MLLMs due to limited chart types and inappropriate metrics.
We propose ChartBench, a comprehensive benchmark designed to assess chart comprehension and data reliability through complex visual reasoning.
arXiv Detail & Related papers (2023-12-26T07:20:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.