ChartQA: A Benchmark for Question Answering about Charts with Visual and
Logical Reasoning
- URL: http://arxiv.org/abs/2203.10244v1
- Date: Sat, 19 Mar 2022 05:00:30 GMT
- Title: ChartQA: A Benchmark for Question Answering about Charts with Visual and
Logical Reasoning
- Authors: Ahmed Masry, Do Xuan Long, Jia Qing Tan, Shafiq Joty, Enamul Hoque
- Abstract summary: We present a benchmark covering 9.6K human-written questions and 23.1K questions generated from human-written chart summaries.
We present two transformer-based models that combine visual features and the data table of the chart in a unified way to answer questions.
- Score: 7.192233658525916
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Charts are very popular for analyzing data. When exploring charts, people
often ask a variety of complex reasoning questions that involve several logical
and arithmetic operations. They also commonly refer to visual features of a
chart in their questions. However, most existing datasets do not focus on such
complex reasoning questions as their questions are template-based and answers
come from a fixed-vocabulary. In this work, we present a large-scale benchmark
covering 9.6K human-written questions as well as 23.1K questions generated from
human-written chart summaries. To address the unique challenges in our
benchmark involving visual and logical reasoning over charts, we present two
transformer-based models that combine visual features and the data table of the
chart in a unified way to answer questions. While our models achieve the
state-of-the-art results on the previous datasets as well as on our benchmark,
the evaluation also reveals several challenges in answering complex reasoning
questions.
Related papers
- GoT-CQA: Graph-of-Thought Guided Compositional Reasoning for Chart Question Answering [12.485921065840294]
Chart Question Answering (CQA) aims at answering questions based on the visual chart content.
We propose a novel Graph-of-Thought (GoT) guided compositional reasoning model called GoT-CQA.
GoT-CQA achieves outstanding performance, especially in complex human-written and reasoning questions.
arXiv Detail & Related papers (2024-09-04T10:56:05Z) - VProChart: Answering Chart Question through Visual Perception Alignment Agent and Programmatic Solution Reasoning [13.011899331656018]
VProChart is a novel framework designed to address the challenges of Chart Question Answering (CQA)
It integrates a lightweight Visual Perception Alignment Agent (VPAgent) and a Programmatic Solution Reasoning approach.
VProChart significantly outperforms existing methods, highlighting its capability in understanding and reasoning with charts.
arXiv Detail & Related papers (2024-09-03T07:19:49Z) - Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference [107.53380946417003]
We propose a novel learning paradigm to generate visual questions with answer-awareness and region-reference.
We develop a simple methodology to self-learn the visual hints without introducing any additional human annotations.
arXiv Detail & Related papers (2024-07-06T15:07:32Z) - CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs [62.84082370758761]
CharXiv is a comprehensive evaluation suite involving 2,323 charts from arXiv papers.
To ensure quality, all charts and questions are handpicked, curated, and verified by human experts.
Results reveal a substantial, previously underestimated gap between the reasoning skills of the strongest proprietary model.
arXiv Detail & Related papers (2024-06-26T17:50:11Z) - ChartAssisstant: A Universal Chart Multimodal Language Model via
Chart-to-Table Pre-training and Multitask Instruction Tuning [54.89249749894061]
ChartAssistant is a vision-language model for universal chart comprehension and reasoning.
It undergoes a two-stage training process, starting with pre-training on chart-to-table parsing to align chart and text.
Experimental results demonstrate significant performance gains over the state-of-the-art UniChart and Chartllama method.
arXiv Detail & Related papers (2024-01-04T17:51:48Z) - OpenCQA: Open-ended Question Answering with Charts [6.7038829115674945]
We introduce a new task called OpenCQA, where the goal is to answer an open-ended question about a chart with texts.
We implement and evaluate a set of baselines under three practical settings.
Our analysis of the results show that the top performing models generally produce fluent and coherent text.
arXiv Detail & Related papers (2022-10-12T23:37:30Z) - Chart Question Answering: State of the Art and Future Directions [0.0]
Chart Question Answering (CQA) systems typically take a chart and a natural language question as input and automatically generate the answer.
We systematically review the current state-of-the-art research focusing on the problem of chart question answering.
arXiv Detail & Related papers (2022-05-08T22:54:28Z) - Question-Answer Sentence Graph for Joint Modeling Answer Selection [122.29142965960138]
We train and integrate state-of-the-art (SOTA) models for computing scores between question-question, question-answer, and answer-answer pairs.
Online inference is then performed to solve the AS2 task on unseen queries.
arXiv Detail & Related papers (2022-02-16T05:59:53Z) - Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in
Visual Question Answering [71.6781118080461]
We propose a Graph Matching Attention (GMA) network for Visual Question Answering (VQA) task.
firstly, it builds graph for the image, but also constructs graph for the question in terms of both syntactic and embedding information.
Next, we explore the intra-modality relationships by a dual-stage graph encoder and then present a bilateral cross-modality graph matching attention to infer the relationships between the image and the question.
Experiments demonstrate that our network achieves state-of-the-art performance on the GQA dataset and the VQA 2.0 dataset.
arXiv Detail & Related papers (2021-12-14T10:01:26Z) - Classification-Regression for Chart Comprehension [16.311371103939205]
Chart question answering (CQA) is a task used for assessing chart comprehension.
We propose a new model that jointly learns classification and regression.
Our model's edge is particularly emphasized on questions with out-of-vocabulary answers.
arXiv Detail & Related papers (2021-11-29T18:46:06Z) - SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions [66.86887670416193]
We show that state-of-the-art VQA models have comparable performance in answering perception and reasoning questions, but suffer from consistency problems.
To address this shortcoming, we propose an approach called Sub-Question-aware Network Tuning (SQuINT)
We show that SQuINT improves model consistency by 5%, also marginally improving performance on the Reasoning questions in VQA, while also displaying better attention maps.
arXiv Detail & Related papers (2020-01-20T01:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.