Chart Question Answering from Real-World Analytical Narratives
- URL: http://arxiv.org/abs/2507.01627v1
- Date: Wed, 02 Jul 2025 11:58:04 GMT
- Title: Chart Question Answering from Real-World Analytical Narratives
- Authors: Maeve Hutchinson, Radu Jianu, Aidan Slingsby, Jo Wood, Pranava Madhyastha,
- Abstract summary: We present a new dataset for chart question answering (CQA) constructed from visualization notebooks.<n>The dataset features real-world, multi-view charts paired with natural language questions grounded in analytical narratives.
- Score: 5.051297047598238
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a new dataset for chart question answering (CQA) constructed from visualization notebooks. The dataset features real-world, multi-view charts paired with natural language questions grounded in analytical narratives. Unlike prior benchmarks, our data reflects ecologically valid reasoning workflows. Benchmarking state-of-the-art multimodal large language models reveals a significant performance gap, with GPT-4.1 achieving an accuracy of 69.3%, underscoring the challenges posed by this more authentic CQA setting.
Related papers
- Text2Vis: A Challenging and Diverse Benchmark for Generating Multimodal Visualizations from Text [30.74255946385862]
We introduce Text2Vis, a benchmark designed to assess text-to-visualization models.<n>It comprises 1,985 samples, each with a data table, natural language query, short answer, visualization code, and annotated charts.<n>It reveals significant performance gaps, highlighting key challenges, and offering insights for future advancements.
arXiv Detail & Related papers (2025-07-26T14:59:04Z) - ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question Answering [14.468507852394923]
Chart question answering (CQA) has become a critical multimodal task for evaluating the reasoning capabilities of vision-language models.<n>We introduce ChartMind, a new benchmark designed for complex CQA tasks in real-world settings.<n>We propose a context-aware yet model-agnostic framework, ChartLLM, that focuses on extracting key contextual elements.
arXiv Detail & Related papers (2025-05-29T08:46:03Z) - Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts [62.45232157149698]
We introduce a novel Chart Hypothetical Question Answering (HQA) task, which imposes assumptions on the same question to compel models to engage in counterfactual reasoning based on the chart content.<n> Furthermore, we introduce HAI, a human-AI interactive data synthesis approach that leverages the efficient text-editing capabilities of MLLMs alongside human expert knowledge to generate diverse and high-quality HQA data at a low cost.
arXiv Detail & Related papers (2025-03-06T05:08:40Z) - Retrieval-Based Interleaved Visual Chain-of-Thought in Real-World Driving Scenarios [69.00444996464662]
We propose RIV-CoT, a Retrieval-Based Interleaved Visual Chain-of-Thought method that enables vision-language models to reason using visual crops corresponding to relevant entities.<n>Our experiments demonstrate that RIV-CoT improves answer accuracy by 3.1% and reasoning accuracy by 4.6% over vanilla CoT prompting.
arXiv Detail & Related papers (2025-01-08T18:31:16Z) - On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts.
We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z) - CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs [62.84082370758761]
CharXiv is a comprehensive evaluation suite involving 2,323 charts from arXiv papers.
To ensure quality, all charts and questions are handpicked, curated, and verified by human experts.
Results reveal a substantial, previously underestimated gap between the reasoning skills of the strongest proprietary model.
arXiv Detail & Related papers (2024-06-26T17:50:11Z) - TextSquare: Scaling up Text-Centric Visual Instruction Tuning [62.878378882175284]
We introduce a new approach for creating a massive, high-quality instruction-tuning dataset, Square-10M.<n>Our model, TextSquare, considerably surpasses open-source previous state-of-the-art Text-centric MLLMs.<n>It even outperforms top-tier models like GPT4V and Gemini in 6 of 10 text-centric benchmarks.
arXiv Detail & Related papers (2024-04-19T11:38:08Z) - Text2Analysis: A Benchmark of Table Question Answering with Advanced
Data Analysis and Unclear Queries [67.0083902913112]
We develop the Text2Analysis benchmark, incorporating advanced analysis tasks.
We also develop five innovative and effective annotation methods.
We evaluate five state-of-the-art models using three different metrics.
arXiv Detail & Related papers (2023-12-21T08:50:41Z) - RealCQA: Scientific Chart Question Answering as a Test-bed for
First-Order Logic [8.155575318208628]
We introduce a benchmark and dataset for chart visual QA on real-world charts.
Our contribution includes the introduction of a new answer type, 'list', with both ranked and unranked variations.
Results of our experiments, conducted on a real-world out-of-distribution dataset, provide a robust evaluation of large-scale pre-trained models.
arXiv Detail & Related papers (2023-08-03T18:21:38Z) - Classification-Regression for Chart Comprehension [16.311371103939205]
Chart question answering (CQA) is a task used for assessing chart comprehension.
We propose a new model that jointly learns classification and regression.
Our model's edge is particularly emphasized on questions with out-of-vocabulary answers.
arXiv Detail & Related papers (2021-11-29T18:46:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.