VQA-LOL: Visual Question Answering under the Lens of Logic
- URL: http://arxiv.org/abs/2002.08325v2
- Date: Wed, 15 Jul 2020 22:39:12 GMT
- Title: VQA-LOL: Visual Question Answering under the Lens of Logic
- Authors: Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang
- Abstract summary: We investigate whether visual question answering systems trained to answer a question about an image, are able to answer the logical composition of multiple such questions.
We construct an augmentation of the VQA dataset as a benchmark, with questions containing logical compositions and linguistic transformations.
We propose our Lens of Logic (LOL) model which uses question-attention and logic-attention to understand logical connectives in the question, and a novel Fr'echet-Compatibility Loss.
- Score: 58.30291671877342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Logical connectives and their implications on the meaning of a natural
language sentence are a fundamental aspect of understanding. In this paper, we
investigate whether visual question answering (VQA) systems trained to answer a
question about an image, are able to answer the logical composition of multiple
such questions. When put under this \textit{Lens of Logic}, state-of-the-art
VQA models have difficulty in correctly answering these logically composed
questions. We construct an augmentation of the VQA dataset as a benchmark, with
questions containing logical compositions and linguistic transformations
(negation, disjunction, conjunction, and antonyms). We propose our {Lens of
Logic (LOL)} model which uses question-attention and logic-attention to
understand logical connectives in the question, and a novel
Fr\'echet-Compatibility Loss, which ensures that the answers of the component
questions and the composed question are consistent with the inferred logical
operation. Our model shows substantial improvement in learning logical
compositions while retaining performance on VQA. We suggest this work as a move
towards robustness by embedding logical connectives in visual understanding.
Related papers
- GRSQA -- Graph Reasoning-Structured Question Answering Dataset [50.223851616680754]
We introduce the Graph Reasoning-Structured Question Answering dataset (GRS-QA), which includes both semantic contexts and reasoning structures for QA pairs.
Unlike existing M-QA datasets, GRS-QA explicitly captures intricate reasoning pathways by constructing reasoning graphs.
Our empirical analysis reveals that LLMs perform differently when handling questions with varying reasoning structures.
arXiv Detail & Related papers (2024-11-01T05:14:03Z) - MedLogic-AQA: Enhancing Medical Question Answering with Abstractive Models Focusing on Logical Structures [24.262037382512975]
We propose a novel Abstractive QA system MedLogic-AQA that harnesses First Order Logic (FOL) based rules extracted from both context and questions to generate well-grounded answers.
This distinctive fusion of logical reasoning with abstractive QA equips our system to produce answers that are logically sound, relevant, and engaging.
arXiv Detail & Related papers (2024-10-20T18:29:38Z) - Modeling Hierarchical Reasoning Chains by Linking Discourse Units and
Key Phrases for Reading Comprehension [80.99865844249106]
We propose a holistic graph network (HGN) which deals with context at both discourse level and word level, as the basis for logical reasoning.
Specifically, node-level and type-level relations, which can be interpreted as bridges in the reasoning process, are modeled by a hierarchical interaction mechanism.
arXiv Detail & Related papers (2023-06-21T07:34:27Z) - Discourse-Aware Graph Networks for Textual Logical Reasoning [142.0097357999134]
Passage-level logical relations represent entailment or contradiction between propositional units (e.g., a concluding sentence)
We propose logic structural-constraint modeling to solve the logical reasoning QA and introduce discourse-aware graph networks (DAGNs)
The networks first construct logic graphs leveraging in-line discourse connectives and generic logic theories, then learn logic representations by end-to-end evolving the logic relations with an edge-reasoning mechanism and updating the graph features.
arXiv Detail & Related papers (2022-07-04T14:38:49Z) - AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine
Reading Comprehension [21.741085513119785]
Recent machine reading comprehension datasets such as ReClor and LogiQA require performing logical reasoning over text.
We present a neural-symbolic approach which, to predict an answer, passes messages over a graph representing logical relations between text units.
arXiv Detail & Related papers (2022-03-16T23:51:01Z) - DAGN: Discourse-Aware Graph Network for Logical Reasoning [83.8041050565304]
We propose a discourse-aware graph network (DAGN) that reasons relying on the discourse structure of the texts.
The model encodes discourse information as a graph with elementary discourse units (EDUs) and discourse relations, and learns the discourse-aware features via a graph network for downstream QA tasks.
arXiv Detail & Related papers (2021-03-26T09:41:56Z) - SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions [66.86887670416193]
We show that state-of-the-art VQA models have comparable performance in answering perception and reasoning questions, but suffer from consistency problems.
To address this shortcoming, we propose an approach called Sub-Question-aware Network Tuning (SQuINT)
We show that SQuINT improves model consistency by 5%, also marginally improving performance on the Reasoning questions in VQA, while also displaying better attention maps.
arXiv Detail & Related papers (2020-01-20T01:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.