PQA: Perceptual Question Answering
- URL: http://arxiv.org/abs/2104.03589v1
- Date: Thu, 8 Apr 2021 08:06:21 GMT
- Title: PQA: Perceptual Question Answering
- Authors: Yonggang Qi, Kai Zhang, Aneeshan Sain, Yi-Zhe Song
- Abstract summary: Perceptual organization remains one of the very few established theories on the human visual system.
In this paper, we rejuvenate the study of perceptual organization, by advocating two positional changes.
We examine purposefully generated synthetic data, instead of complex real imagery.
We then borrow insights from human psychology to design an agent that casts perceptual organization as a self-attention problem.
- Score: 35.051664704756995
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Perceptual organization remains one of the very few established theories on
the human visual system. It underpinned many pre-deep seminal works on
segmentation and detection, yet research has seen a rapid decline since the
preferential shift to learning deep models. Of the limited attempts, most aimed
at interpreting complex visual scenes using perceptual organizational rules.
This has however been proven to be sub-optimal, since models were unable to
effectively capture the visual complexity in real-world imagery. In this paper,
we rejuvenate the study of perceptual organization, by advocating two
positional changes: (i) we examine purposefully generated synthetic data,
instead of complex real imagery, and (ii) we ask machines to synthesize novel
perceptually-valid patterns, instead of explaining existing data. Our overall
answer lies with the introduction of a novel visual challenge -- the challenge
of perceptual question answering (PQA). Upon observing example perceptual
question-answer pairs, the goal for PQA is to solve similar questions by
generating answers entirely from scratch (see Figure 1). Our first contribution
is therefore the first dataset of perceptual question-answer pairs, each
generated specifically for a particular Gestalt principle. We then borrow
insights from human psychology to design an agent that casts perceptual
organization as a self-attention problem, where a proposed grid-to-grid mapping
network directly generates answer patterns from scratch. Experiments show our
agent to outperform a selection of naive and strong baselines. A human study
however indicates that ours uses astronomically more data to learn when
compared to an average human, necessitating future research (with or without
our dataset).
Related papers
- Modelling the Human Intuition to Complete the Missing Information in Images for Convolutional Neural Networks [0.0]
Experimental psychology reveals many types of intuition, which depend on state of the human mind.
We focus on visual intuition, useful for completing missing information during visual cognitive tasks.
In this study, we attempt to model intuition and incorporate this formalism to improve the performance of the Convolutional Neural Networks.
arXiv Detail & Related papers (2024-07-12T13:05:27Z) - Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference [107.53380946417003]
We propose a novel learning paradigm to generate visual questions with answer-awareness and region-reference.
We develop a simple methodology to self-learn the visual hints without introducing any additional human annotations.
arXiv Detail & Related papers (2024-07-06T15:07:32Z) - Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - PTR: A Benchmark for Part-based Conceptual, Relational, and Physical
Reasoning [135.2892665079159]
We introduce a new large-scale diagnostic visual reasoning dataset named PTR.
PTR contains around 70k RGBD synthetic images with ground truth object and part level annotations.
We examine several state-of-the-art visual reasoning models on this dataset and observe that they still make many surprising mistakes.
arXiv Detail & Related papers (2021-12-09T18:59:34Z) - Think about it! Improving defeasible reasoning by first modeling the
question scenario [35.6110036360506]
Defeasible reasoning is the mode of reasoning where conclusions can be overturned by taking into account new evidence.
Our research goal asks whether neural models can similarly benefit from envisioning the question scenario before answering a defeasible query.
Our system, CURIOUS, achieves a new state-of-the-art on three different defeasible reasoning datasets.
arXiv Detail & Related papers (2021-10-24T04:13:52Z) - Cross-modal Knowledge Reasoning for Knowledge-based Visual Question
Answering [27.042604046441426]
Knowledge-based Visual Question Answering (KVQA) requires external knowledge beyond the visible content to answer questions about an image.
In this paper, we depict an image by multiple knowledge graphs from the visual, semantic and factual views.
We decompose the model into a series of memory-based reasoning steps, each performed by a G raph-based R ead, U pdate, and C ontrol.
We achieve a new state-of-the-art performance on three popular benchmark datasets, including FVQA, Visual7W-KB and OK-VQA.
arXiv Detail & Related papers (2020-08-31T23:25:01Z) - Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning" [49.76230210108583]
We propose a framework to isolate and evaluate the reasoning aspect of visual question answering (VQA) separately from its perception.
We also propose a novel top-down calibration technique that allows the model to answer reasoning questions even with imperfect perception.
On the challenging GQA dataset, this framework is used to perform in-depth, disentangled comparisons between well-known VQA models.
arXiv Detail & Related papers (2020-06-20T08:48:29Z) - SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions [66.86887670416193]
We show that state-of-the-art VQA models have comparable performance in answering perception and reasoning questions, but suffer from consistency problems.
To address this shortcoming, we propose an approach called Sub-Question-aware Network Tuning (SQuINT)
We show that SQuINT improves model consistency by 5%, also marginally improving performance on the Reasoning questions in VQA, while also displaying better attention maps.
arXiv Detail & Related papers (2020-01-20T01:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.