Related papers: Chart Question Answering from Real-World Analytical Narratives

Chart Question Answering from Real-World Analytical Narratives

URL: http://arxiv.org/abs/2507.01627v1
Date: Wed, 02 Jul 2025 11:58:04 GMT
Title: Chart Question Answering from Real-World Analytical Narratives
Authors: Maeve Hutchinson, Radu Jianu, Aidan Slingsby, Jo Wood, Pranava Madhyastha,
Abstract summary: We present a new dataset for chart question answering (CQA) constructed from visualization notebooks.<n>The dataset features real-world, multi-view charts paired with natural language questions grounded in analytical narratives.
Score: 5.051297047598238
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a new dataset for chart question answering (CQA) constructed from visualization notebooks. The dataset features real-world, multi-view charts paired with natural language questions grounded in analytical narratives. Unlike prior benchmarks, our data reflects ecologically valid reasoning workflows. Benchmarking state-of-the-art multimodal large language models reveals a significant performance gap, with GPT-4.1 achieving an accuracy of 69.3%, underscoring the challenges posed by this more authentic CQA setting.

Related papers

Text2Vis: A Challenging and Diverse Benchmark for Generating Multimodal Visualizations from Text [30.74255946385862]
We introduce Text2Vis, a benchmark designed to assess text-to-visualization models.<n>It comprises 1,985 samples, each with a data table, natural language query, short answer, visualization code, and annotated charts.<n>It reveals significant performance gaps, highlighting key challenges, and offering insights for future advancements.
arXiv Detail & Related papers (2025-07-26T14:59:04Z)
ChartMind: A Comprehensive Benchmark for Complex Real-world Multimodal Chart Question Answering [14.468507852394923]
Chart question answering (CQA) has become a critical multimodal task for evaluating the reasoning capabilities of vision-language models.<n>We introduce ChartMind, a new benchmark designed for complex CQA tasks in real-world settings.<n>We propose a context-aware yet model-agnostic framework, ChartLLM, that focuses on extracting key contextual elements.
arXiv Detail & Related papers (2025-05-29T08:46:03Z)
Chart-HQA: A Benchmark for Hypothetical Question Answering in Charts [62.45232157149698]
We introduce a novel Chart Hypothetical Question Answering (HQA) task, which imposes assumptions on the same question to compel models to engage in counterfactual reasoning based on the chart content.<n> Furthermore, we introduce HAI, a human-AI interactive data synthesis approach that leverages the efficient text-editing capabilities of MLLMs alongside human expert knowledge to generate diverse and high-quality HQA data at a low cost.
arXiv Detail & Related papers (2025-03-06T05:08:40Z)
Retrieval-Based Interleaved Visual Chain-of-Thought in Real-World Driving Scenarios [69.00444996464662]
We propose RIV-CoT, a Retrieval-Based Interleaved Visual Chain-of-Thought method that enables vision-language models to reason using visual crops corresponding to relevant entities.<n>Our experiments demonstrate that RIV-CoT improves answer accuracy by 3.1% and reasoning accuracy by 4.6% over vanilla CoT prompting.
arXiv Detail & Related papers (2025-01-08T18:31:16Z)
On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
This paper explores the training processes necessary to improve MLLMs' comprehension of charts. We introduce CHOPINLLM, an MLLM tailored for in-depth chart comprehension.
arXiv Detail & Related papers (2024-07-19T17:58:36Z)
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs [62.84082370758761]
CharXiv is a comprehensive evaluation suite involving 2,323 charts from arXiv papers. To ensure quality, all charts and questions are handpicked, curated, and verified by human experts. Results reveal a substantial, previously underestimated gap between the reasoning skills of the strongest proprietary model.
arXiv Detail & Related papers (2024-06-26T17:50:11Z)
TextSquare: Scaling up Text-Centric Visual Instruction Tuning [62.878378882175284]
We introduce a new approach for creating a massive, high-quality instruction-tuning dataset, Square-10M.<n>Our model, TextSquare, considerably surpasses open-source previous state-of-the-art Text-centric MLLMs.<n>It even outperforms top-tier models like GPT4V and Gemini in 6 of 10 text-centric benchmarks.
arXiv Detail & Related papers (2024-04-19T11:38:08Z)
Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries [67.0083902913112]
We develop the Text2Analysis benchmark, incorporating advanced analysis tasks. We also develop five innovative and effective annotation methods. We evaluate five state-of-the-art models using three different metrics.
arXiv Detail & Related papers (2023-12-21T08:50:41Z)
RealCQA: Scientific Chart Question Answering as a Test-bed for First-Order Logic [8.155575318208628]
We introduce a benchmark and dataset for chart visual QA on real-world charts. Our contribution includes the introduction of a new answer type, 'list', with both ranked and unranked variations. Results of our experiments, conducted on a real-world out-of-distribution dataset, provide a robust evaluation of large-scale pre-trained models.
arXiv Detail & Related papers (2023-08-03T18:21:38Z)
Classification-Regression for Chart Comprehension [16.311371103939205]
Chart question answering (CQA) is a task used for assessing chart comprehension. We propose a new model that jointly learns classification and regression. Our model's edge is particularly emphasized on questions with out-of-vocabulary answers.
arXiv Detail & Related papers (2021-11-29T18:46:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.