Visual Question Answering on 360{\deg} Images
- URL: http://arxiv.org/abs/2001.03339v1
- Date: Fri, 10 Jan 2020 08:18:21 GMT
- Title: Visual Question Answering on 360{\deg} Images
- Authors: Shih-Han Chou, Wei-Lun Chao, Wei-Sheng Lai, Min Sun, Ming-Hsuan Yang
- Abstract summary: VQA 360 is a novel task of visual question answering on 360 images.
We collect the first VQA 360 dataset, containing around 17,000 real-world image-question-answer triplets for a variety of question types.
- Score: 96.00046925811515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we introduce VQA 360, a novel task of visual question answering
on 360 images. Unlike a normal field-of-view image, a 360 image captures the
entire visual content around the optical center of a camera, demanding more
sophisticated spatial understanding and reasoning. To address this problem, we
collect the first VQA 360 dataset, containing around 17,000 real-world
image-question-answer triplets for a variety of question types. We then study
two different VQA models on VQA 360, including one conventional model that
takes an equirectangular image (with intrinsic distortion) as input and one
dedicated model that first projects a 360 image onto cubemaps and subsequently
aggregates the information from multiple spatial resolutions. We demonstrate
that the cubemap-based model with multi-level fusion and attention diffusion
performs favorably against other variants and the equirectangular-based models.
Nevertheless, the gap between the humans' and machines' performance reveals the
need for more advanced VQA 360 algorithms. We, therefore, expect our dataset
and studies to serve as the benchmark for future development in this
challenging task. Dataset, code, and pre-trained models are available online.
Related papers
- UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for
Autonomous Driving Scenario [77.14723238359318]
NuScenesQA is the first benchmark for VQA in the autonomous driving scenario, encompassing 34K visual scenes and 460K question-answer pairs.
We leverage existing 3D detection annotations to generate scene graphs and design question templates manually.
We develop a series of baselines that employ advanced 3D detection and VQA techniques.
arXiv Detail & Related papers (2023-05-24T07:40:50Z) - ST360IQ: No-Reference Omnidirectional Image Quality Assessment with
Spherical Vision Transformers [17.48330099000856]
We present a method for no-reference 360 image quality assessment.
Our approach predicts the quality of an omnidirectional image correlated with the human-perceived image quality.
arXiv Detail & Related papers (2023-03-13T07:48:46Z) - HRVQA: A Visual Question Answering Benchmark for High-Resolution Aerial
Images [18.075338835513993]
We introduce a new dataset, HRVQA, which provides collected 53512 aerial images of 1024*1024 pixels and 1070240 QA pairs.
To benchmark the understanding capability of VQA models for aerial images, we evaluate the relevant methods on HRVQA.
Our method achieves superior performance in comparison to the previous state-of-the-art approaches.
arXiv Detail & Related papers (2023-01-23T14:36:38Z) - From Pixels to Objects: Cubic Visual Attention for Visual Question
Answering [132.95819467484517]
Recently, attention-based Visual Question Answering (VQA) has achieved great success by utilizing question to target different visual areas that are related to the answer.
We propose a Cubic Visual Attention (CVA) model by successfully applying a novel channel and spatial attention on object regions to improve VQA task.
Experimental results show that our proposed method significantly outperforms the state-of-the-arts.
arXiv Detail & Related papers (2022-06-04T07:03:18Z) - Blind VQA on 360{\deg} Video via Progressively Learning from Pixels,
Frames and Video [66.57045901742922]
Blind visual quality assessment (BVQA) on 360textdegree video plays a key role in optimizing immersive multimedia systems.
In this paper, we take into account the progressive paradigm of human perception towards spherical video quality.
We propose a novel BVQA approach (namely ProVQA) for 360textdegree video via progressively learning from pixels, frames and video.
arXiv Detail & Related papers (2021-11-18T03:45:13Z) - Adaptive Hypergraph Convolutional Network for No-Reference 360-degree
Image Quality Assessment [21.23871001977444]
In no-reference 360-degree image quality assessment (NR 360IQA), graph convolutional networks (GCNs) have achieved impressive performance.
We propose an adaptive hypergraph convolutional network for NR 360IQA, denoted as AHGCN.
Our proposed approach has a clear advantage over state-of-the-art full-reference and no-reference IQA models.
arXiv Detail & Related papers (2021-05-19T14:02:48Z) - Visual Question Answering on Image Sets [70.4472272672716]
We introduce the task of Image-Set Visual Question Answering (ISVQA), which generalizes the commonly studied single-image VQA problem to multi-image settings.
Taking a natural language question and a set of images as input, it aims to answer the question based on the content of the images.
The questions can be about objects and relationships in one or more images or about the entire scene depicted by the image set.
arXiv Detail & Related papers (2020-08-27T08:03:32Z) - A Fixation-based 360{\deg} Benchmark Dataset for Salient Object
Detection [21.314578493964333]
Fixation prediction (FP) in panoramic contents has been widely investigated along with the booming trend of virtual reality (VR) applications.
salient object detection (SOD) has been seldom explored in 360deg images due to the lack of datasets representative of real scenes.
arXiv Detail & Related papers (2020-01-22T11:16:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.