VQA-LOL: Visual Question Answering under the Lens of Logic
- URL: http://arxiv.org/abs/2002.08325v2
- Date: Wed, 15 Jul 2020 22:39:12 GMT
- Title: VQA-LOL: Visual Question Answering under the Lens of Logic
- Authors: Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang
- Abstract summary: We investigate whether visual question answering systems trained to answer a question about an image, are able to answer the logical composition of multiple such questions.
We construct an augmentation of the VQA dataset as a benchmark, with questions containing logical compositions and linguistic transformations.
We propose our Lens of Logic (LOL) model which uses question-attention and logic-attention to understand logical connectives in the question, and a novel Fr'echet-Compatibility Loss.
- Score: 58.30291671877342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Logical connectives and their implications on the meaning of a natural
language sentence are a fundamental aspect of understanding. In this paper, we
investigate whether visual question answering (VQA) systems trained to answer a
question about an image, are able to answer the logical composition of multiple
such questions. When put under this \textit{Lens of Logic}, state-of-the-art
VQA models have difficulty in correctly answering these logically composed
questions. We construct an augmentation of the VQA dataset as a benchmark, with
questions containing logical compositions and linguistic transformations
(negation, disjunction, conjunction, and antonyms). We propose our {Lens of
Logic (LOL)} model which uses question-attention and logic-attention to
understand logical connectives in the question, and a novel
Fr\'echet-Compatibility Loss, which ensures that the answers of the component
questions and the composed question are consistent with the inferred logical
operation. Our model shows substantial improvement in learning logical
compositions while retaining performance on VQA. We suggest this work as a move
towards robustness by embedding logical connectives in visual understanding.
Related papers
- Disentangling Logic: The Role of Context in Large Language Model Reasoning Capabilities [31.728976421529577]
We investigate the contrast across abstract and contextualized logical problems from a comprehensive set of domains.
We focus on standard propositional logic, specifically propositional deductive and abductive logic reasoning.
Our experiments aim to provide insights into disentangling context in logical reasoning and the true reasoning capabilities of LLMs.
arXiv Detail & Related papers (2024-06-04T21:25:06Z) - Modeling Hierarchical Reasoning Chains by Linking Discourse Units and
Key Phrases for Reading Comprehension [80.99865844249106]
We propose a holistic graph network (HGN) which deals with context at both discourse level and word level, as the basis for logical reasoning.
Specifically, node-level and type-level relations, which can be interpreted as bridges in the reasoning process, are modeled by a hierarchical interaction mechanism.
arXiv Detail & Related papers (2023-06-21T07:34:27Z) - Discourse-Aware Graph Networks for Textual Logical Reasoning [142.0097357999134]
Passage-level logical relations represent entailment or contradiction between propositional units (e.g., a concluding sentence)
We propose logic structural-constraint modeling to solve the logical reasoning QA and introduce discourse-aware graph networks (DAGNs)
The networks first construct logic graphs leveraging in-line discourse connectives and generic logic theories, then learn logic representations by end-to-end evolving the logic relations with an edge-reasoning mechanism and updating the graph features.
arXiv Detail & Related papers (2022-07-04T14:38:49Z) - AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine
Reading Comprehension [21.741085513119785]
Recent machine reading comprehension datasets such as ReClor and LogiQA require performing logical reasoning over text.
We present a neural-symbolic approach which, to predict an answer, passes messages over a graph representing logical relations between text units.
arXiv Detail & Related papers (2022-03-16T23:51:01Z) - Logic-Driven Context Extension and Data Augmentation for Logical
Reasoning of Text [65.24325614642223]
We propose to understand logical symbols and expressions in the text to arrive at the answer.
Based on such logical information, we put forward a context extension framework and a data augmentation algorithm.
Our method achieves the state-of-the-art performance, and both logic-driven context extension framework and data augmentation algorithm can help improve the accuracy.
arXiv Detail & Related papers (2021-05-08T10:09:36Z) - DAGN: Discourse-Aware Graph Network for Logical Reasoning [83.8041050565304]
We propose a discourse-aware graph network (DAGN) that reasons relying on the discourse structure of the texts.
The model encodes discourse information as a graph with elementary discourse units (EDUs) and discourse relations, and learns the discourse-aware features via a graph network for downstream QA tasks.
arXiv Detail & Related papers (2021-03-26T09:41:56Z) - SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions [66.86887670416193]
We show that state-of-the-art VQA models have comparable performance in answering perception and reasoning questions, but suffer from consistency problems.
To address this shortcoming, we propose an approach called Sub-Question-aware Network Tuning (SQuINT)
We show that SQuINT improves model consistency by 5%, also marginally improving performance on the Reasoning questions in VQA, while also displaying better attention maps.
arXiv Detail & Related papers (2020-01-20T01:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.