MapQA: A Dataset for Question Answering on Choropleth Maps
- URL: http://arxiv.org/abs/2211.08545v1
- Date: Tue, 15 Nov 2022 22:31:38 GMT
- Title: MapQA: A Dataset for Question Answering on Choropleth Maps
- Authors: Shuaichen Chang, David Palzer, Jialin Li, Eric Fosler-Lussier,
Ningchuan Xiao
- Abstract summary: We present MapQA, a large-scale dataset of 800K question-answer pairs over 60K map images.
Our task tests various levels of map understanding, from surface questions about map styles to complex questions that require reasoning on the underlying data.
We also present a novel algorithm, Visual Multi-Output Data Extraction based QA (V-MODEQA) for MapQA.
- Score: 12.877773112674506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Choropleth maps are a common visual representation for region-specific
tabular data and are used in a number of different venues (newspapers,
articles, etc). These maps are human-readable but are often challenging to deal
with when trying to extract data for screen readers, analyses, or other related
tasks. Recent research into Visual-Question Answering (VQA) has studied
question answering on human-generated charts (ChartQA), such as bar, line, and
pie charts. However, little work has paid attention to understanding maps;
general VQA models, and ChartQA models, suffer when asked to perform this task.
To facilitate and encourage research in this area, we present MapQA, a
large-scale dataset of ~800K question-answer pairs over ~60K map images. Our
task tests various levels of map understanding, from surface questions about
map styles to complex questions that require reasoning on the underlying data.
We present the unique challenges of MapQA that frustrate most strong baseline
algorithms designed for ChartQA and general VQA tasks. We also present a novel
algorithm, Visual Multi-Output Data Extraction based QA (V-MODEQA) for MapQA.
V-MODEQA extracts the underlying structured data from a map image with a
multi-output model and then performs reasoning on the extracted data. Our
experimental results show that V-MODEQA has better overall performance and
robustness on MapQA than the state-of-the-art ChartQA and VQA algorithms by
capturing the unique properties in map question answering.
Related papers
- MAPWise: Evaluating Vision-Language Models for Advanced Map Queries [47.15503716894445]
This study investigates the efficacy of vision-language models (VLMs) in answering questions based on maps.
We introduce a novel map-based question-answering benchmark, consisting of maps from three geographical regions (United States, India, China)
Our benchmark incorporates 43 diverse question templates, requiring nuanced understanding of relative spatial relationships, intricate map features, and complex reasoning.
arXiv Detail & Related papers (2024-08-30T20:57:34Z) - Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model [4.41132900194195]
We propose a new method called it chain of QA for human-written questions (CoQAH)
CoQAH utilizes a sequence of QA interactions between a large language model and a VQA model trained on synthetic data to reason and derive logical answers for human-written questions.
We tested the effectiveness of CoQAH on two types of human-written VQA datasets for 3D-rendered and chest X-ray images.
arXiv Detail & Related papers (2024-01-12T06:49:49Z) - NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for
Autonomous Driving Scenario [77.14723238359318]
NuScenesQA is the first benchmark for VQA in the autonomous driving scenario, encompassing 34K visual scenes and 460K question-answer pairs.
We leverage existing 3D detection annotations to generate scene graphs and design question templates manually.
We develop a series of baselines that employ advanced 3D detection and VQA techniques.
arXiv Detail & Related papers (2023-05-24T07:40:50Z) - BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution
Generalization of VQA Models [47.64219291655723]
We introduce a new test set for visual question answering (VQA) called BinaryVQA to push the limits of VQA models.
Our dataset includes 7,800 questions across 1,024 images and covers a wide variety of objects, topics, and concepts.
Around 63% of the questions have positive answers.
arXiv Detail & Related papers (2023-01-28T00:03:44Z) - Towards Complex Document Understanding By Discrete Reasoning [77.91722463958743]
Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language.
We introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages and 16,558 question-answer pairs.
We develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions.
arXiv Detail & Related papers (2022-07-25T01:43:19Z) - From Pixels to Objects: Cubic Visual Attention for Visual Question
Answering [132.95819467484517]
Recently, attention-based Visual Question Answering (VQA) has achieved great success by utilizing question to target different visual areas that are related to the answer.
We propose a Cubic Visual Attention (CVA) model by successfully applying a novel channel and spatial attention on object regions to improve VQA task.
Experimental results show that our proposed method significantly outperforms the state-of-the-arts.
arXiv Detail & Related papers (2022-06-04T07:03:18Z) - ChartQA: A Benchmark for Question Answering about Charts with Visual and
Logical Reasoning [7.192233658525916]
We present a benchmark covering 9.6K human-written questions and 23.1K questions generated from human-written chart summaries.
We present two transformer-based models that combine visual features and the data table of the chart in a unified way to answer questions.
arXiv Detail & Related papers (2022-03-19T05:00:30Z) - Grounding Answers for Visual Questions Asked by Visually Impaired People [16.978747012406266]
VizWiz-VQA-Grounding is the first dataset that visually grounds answers to visual questions asked by people with visual impairments.
We analyze our dataset and compare it with five VQA-Grounding datasets to demonstrate what makes it similar and different.
arXiv Detail & Related papers (2022-02-04T06:47:16Z) - Human-Adversarial Visual Question Answering [62.30715496829321]
We benchmark state-of-the-art VQA models against human-adversarial examples.
We find that a wide range of state-of-the-art models perform poorly when evaluated on these examples.
arXiv Detail & Related papers (2021-06-04T06:25:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.