CHAOS: Chart Analysis with Outlier Samples
- URL: http://arxiv.org/abs/2505.17235v1
- Date: Thu, 22 May 2025 19:26:49 GMT
- Title: CHAOS: Chart Analysis with Outlier Samples
- Authors: Omar Moured, Yufan Chen, Ruiping Liu, Simon Reiß, Philip Torr, Jiaming Zhang, Rainer Stiefelhagen,
- Abstract summary: CHAOS is a benchmark to evaluate Multimodal Large Language Models (MLLMs) against chart perturbations.<n>The benchmark includes 13 state-of-the-art MLLMs divided into three groups according to the training scope and data.
- Score: 31.64244745491319
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Charts play a critical role in data analysis and visualization, yet real-world applications often present charts with challenging or noisy features. However, "outlier charts" pose a substantial challenge even for Multimodal Large Language Models (MLLMs), which can struggle to interpret perturbed charts. In this work, we introduce CHAOS (CHart Analysis with Outlier Samples), a robustness benchmark to systematically evaluate MLLMs against chart perturbations. CHAOS encompasses five types of textual and ten types of visual perturbations, each presented at three levels of severity (easy, mid, hard) inspired by the study result of human evaluation. The benchmark includes 13 state-of-the-art MLLMs divided into three groups (i.e., general-, document-, and chart-specific models) according to the training scope and data. Comprehensive analysis involves two downstream tasks (ChartQA and Chart-to-Text). Extensive experiments and case studies highlight critical insights into robustness of models across chart perturbations, aiming to guide future research in chart understanding domain. Data and code are publicly available at: http://huggingface.co/datasets/omoured/CHAOS.
Related papers
- ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding [18.67532755744138]
Automated chart understanding poses significant challenges to existing multimodal large language models.<n>Current step-by-step reasoning models primarily focus on text-based logical reasoning for chart understanding.<n>We propose ChartSketcher, a multimodal feedback-driven step-by-step reasoning method designed to address these limitations.
arXiv Detail & Related papers (2025-05-25T10:21:29Z) - Towards Understanding Graphical Perception in Large Multimodal Models [80.44471730672801]
We leverage the theory of graphical perception to develop an evaluation framework for analyzing gaps in LMMs' perception abilities in charts.<n>We apply our framework to evaluate and diagnose the perception capabilities of state-of-the-art LMMs at three levels (chart, visual element, and pixel)
arXiv Detail & Related papers (2025-03-13T20:13:39Z) - On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts.
We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z) - ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild [28.643565008567172]
We introduce ChartGemma, a novel chart understanding and reasoning model developed over PaliGemma.
Rather than relying on underlying data tables, ChartGemma is trained on instruction-tuning data generated directly from chart images.
Our simple approach achieves state-of-the-art results across $5$ benchmarks spanning chart summarization, question answering, and fact-checking.
arXiv Detail & Related papers (2024-07-04T22:16:40Z) - CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs [62.84082370758761]
CharXiv is a comprehensive evaluation suite involving 2,323 charts from arXiv papers.
To ensure quality, all charts and questions are handpicked, curated, and verified by human experts.
Results reveal a substantial, previously underestimated gap between the reasoning skills of the strongest proprietary model.
arXiv Detail & Related papers (2024-06-26T17:50:11Z) - Are Large Vision Language Models up to the Challenge of Chart Comprehension and Reasoning? An Extensive Investigation into the Capabilities and Limitations of LVLMs [11.19928977117624]
Natural language is a powerful complementary modality of communication for data visualizations, such as bar and line charts.
Various downstream tasks have been introduced recently such as chart question answering, chart summarization, and fact-checking with charts.
These tasks pose a unique challenge, demanding both vision-language reasoning and a nuanced understanding of chart data tables, visual encodings, and natural language prompts.
This paper presents the first comprehensive evaluation of the recently developed large vision language models (LVLMs) for chart understanding and reasoning tasks.
arXiv Detail & Related papers (2024-06-01T01:43:30Z) - AltChart: Enhancing VLM-based Chart Summarization Through Multi-Pretext Tasks [31.414783623207477]
We introduce the AltChart dataset, comprising 10,000 real chart images, each paired with a comprehensive summary.
We propose a new method for pretraining Vision-Language Models (VLMs) to learn fine-grained chart representations.
We conduct extensive evaluations of four leading chart summarization models, analyzing how accessible their descriptions are.
arXiv Detail & Related papers (2024-05-22T12:18:52Z) - From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models [98.41645229835493]
Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making.<n>Large foundation models, such as large language models, have revolutionized various natural language processing tasks.<n>This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis.
arXiv Detail & Related papers (2024-03-18T17:57:09Z) - ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning [55.22996841790139]
We benchmark the ability of off-the-shelf Multi-modal Large Language Models (MLLMs) in the chart domain.<n>We construct ChartX, a multi-modal evaluation set covering 18 chart types, 7 chart tasks, 22 disciplinary topics, and high-quality chart data.<n>We develop ChartVLM to offer a new perspective on handling multi-modal tasks that strongly depend on interpretable patterns.
arXiv Detail & Related papers (2024-02-19T14:48:23Z) - ChartAssisstant: A Universal Chart Multimodal Language Model via
Chart-to-Table Pre-training and Multitask Instruction Tuning [54.89249749894061]
ChartAssistant is a vision-language model for universal chart comprehension and reasoning.
It undergoes a two-stage training process, starting with pre-training on chart-to-table parsing to align chart and text.
Experimental results demonstrate significant performance gains over the state-of-the-art UniChart and Chartllama method.
arXiv Detail & Related papers (2024-01-04T17:51:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.