Related papers: Rethinking Comprehensive Benchmark for Chart Understanding: A Perspective from Scientific Literature

Rethinking Comprehensive Benchmark for Chart Understanding: A Perspective from Scientific Literature

URL: http://arxiv.org/abs/2412.12150v1
Date: Wed, 11 Dec 2024 05:29:54 GMT
Title: Rethinking Comprehensive Benchmark for Chart Understanding: A Perspective from Scientific Literature
Authors: Lingdong Shen, Qigqi, Kun Ding, Gaofeng Meng, Shiming Xiang,
Abstract summary: We introduce a new benchmark, Scientific Chart QA (SCI-CQA), which emphasizes flowcharts as a critical yet often overlooked category.<n>We curated a dataset of 202,760 image-text pairs from 15 top-tier computer science conferences papers over the past decade.<n>SCI-CQA also introduces a novel evaluation framework inspired by human exams, encompassing 5,629 carefully curated questions.
Score: 33.69273440337546
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scientific Literature charts often contain complex visual elements, including multi-plot figures, flowcharts, structural diagrams and etc. Evaluating multimodal models using these authentic and intricate charts provides a more accurate assessment of their understanding abilities. However, existing benchmarks face limitations: a narrow range of chart types, overly simplistic template-based questions and visual elements, and inadequate evaluation methods. These shortcomings lead to inflated performance scores that fail to hold up when models encounter real-world scientific charts. To address these challenges, we introduce a new benchmark, Scientific Chart QA (SCI-CQA), which emphasizes flowcharts as a critical yet often overlooked category. To overcome the limitations of chart variety and simplistic visual elements, we curated a dataset of 202,760 image-text pairs from 15 top-tier computer science conferences papers over the past decade. After rigorous filtering, we refined this to 37,607 high-quality charts with contextual information. SCI-CQA also introduces a novel evaluation framework inspired by human exams, encompassing 5,629 carefully curated questions, both objective and open-ended. Additionally, we propose an efficient annotation pipeline that significantly reduces data annotation costs. Finally, we explore context-based chart understanding, highlighting the crucial role of contextual information in solving previously unanswerable questions.

Related papers

CHAOS: Chart Analysis with Outlier Samples [31.64244745491319]
CHAOS is a benchmark to evaluate Multimodal Large Language Models (MLLMs) against chart perturbations.<n>The benchmark includes 13 state-of-the-art MLLMs divided into three groups according to the training scope and data.
arXiv Detail & Related papers (2025-05-22T19:26:49Z)
RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning [63.599057862999]
RefChartQA is a novel benchmark that integrates Chart Question Answering (ChartQA) with visual grounding. Our experiments demonstrate that incorporating spatial awareness via grounding improves response accuracy by over 15%.
arXiv Detail & Related papers (2025-03-29T15:50:08Z)
Towards Understanding Graphical Perception in Large Multimodal Models [80.44471730672801]
We leverage the theory of graphical perception to develop an evaluation framework for analyzing gaps in LMMs' perception abilities in charts. We apply our framework to evaluate and diagnose the perception capabilities of state-of-the-art LMMs at three levels (chart, visual element, and pixel)
arXiv Detail & Related papers (2025-03-13T20:13:39Z)
VProChart: Answering Chart Question through Visual Perception Alignment Agent and Programmatic Solution Reasoning [13.011899331656018]
VProChart is a novel framework designed to address the challenges of Chart Question Answering (CQA) It integrates a lightweight Visual Perception Alignment Agent (VPAgent) and a Programmatic Solution Reasoning approach. VProChart significantly outperforms existing methods, highlighting its capability in understanding and reasoning with charts.
arXiv Detail & Related papers (2024-09-03T07:19:49Z)
FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding [52.35520385083425]
FlowLearn dataset is a resource tailored to enhance the understanding of flowcharts. The scientific subset contains 3,858 flowcharts sourced from scientific literature. The simulated subset contains 10,000 flowcharts created using a customizable script.
arXiv Detail & Related papers (2024-07-06T20:58:51Z)
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs [62.84082370758761]
CharXiv is a comprehensive evaluation suite involving 2,323 charts from arXiv papers. To ensure quality, all charts and questions are handpicked, curated, and verified by human experts. Results reveal a substantial, previously underestimated gap between the reasoning skills of the strongest proprietary model.
arXiv Detail & Related papers (2024-06-26T17:50:11Z)
Enhancing Question Answering on Charts Through Effective Pre-training Tasks [26.571522748519584]
We address the limitation of current VisualQA models when applied to charts and plots. Our findings indicate that existing models particularly underperform in answering questions related to the chart's structural and visual context. We propose three simple pre-training tasks that enforce the existing model in terms of both structural-visual knowledge, as well as its understanding of numerical questions.
arXiv Detail & Related papers (2024-06-14T14:40:10Z)
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models [98.41645229835493]
Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making.<n>Large foundation models, such as large language models, have revolutionized various natural language processing tasks.<n>This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis.
arXiv Detail & Related papers (2024-03-18T17:57:09Z)
StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding [54.45681512355684]
Current chart-related tasks focus on either chart perception that extracts information from the visual charts, or chart reasoning given the extracted data. We introduce StructChart, a novel framework that leverages Structured Triplet Representations (STR) to achieve a unified and label-efficient approach.
arXiv Detail & Related papers (2023-09-20T12:51:13Z)
RealCQA: Scientific Chart Question Answering as a Test-bed for First-Order Logic [8.155575318208628]
We introduce a benchmark and dataset for chart visual QA on real-world charts. Our contribution includes the introduction of a new answer type, 'list', with both ranked and unranked variations. Results of our experiments, conducted on a real-world out-of-distribution dataset, provide a robust evaluation of large-scale pre-trained models.
arXiv Detail & Related papers (2023-08-03T18:21:38Z)
ChartParser: Automatic Chart Parsing for Print-Impaired [2.1325744957975568]
Infographics are often an integral component of scientific documents for reporting qualitative or quantitative findings. Their interpretation continues to be a challenge for the blind, low-vision, and other print-impaired (BLV) individuals. We propose a fully automated pipeline that leverages deep learning, OCR, and image processing techniques to extract all figures from a research paper.
arXiv Detail & Related papers (2022-11-16T12:19:10Z)
Classification-Regression for Chart Comprehension [16.311371103939205]
Chart question answering (CQA) is a task used for assessing chart comprehension. We propose a new model that jointly learns classification and regression. Our model's edge is particularly emphasized on questions with out-of-vocabulary answers.
arXiv Detail & Related papers (2021-11-29T18:46:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.