Related papers: Visual Question Answering based on Formal Logic

Visual Question Answering based on Formal Logic

URL: http://arxiv.org/abs/2111.04785v1
Date: Mon, 8 Nov 2021 19:43:53 GMT
Title: Visual Question Answering based on Formal Logic
Authors: Muralikrishnna G. Sethuraman, Ali Payani, Faramarz Fekri, J. Clayton Kerce
Abstract summary: In VQA, a series of questions are posed based on a set of images and the task at hand is to arrive at the answer. We take a symbolic reasoning based approach using the framework of formal logic. Our proposed method is highly interpretable and each step in the pipeline can be easily analyzed by a human.
Score: 9.023122463034332
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Visual question answering (VQA) has been gaining a lot of traction in the machine learning community in the recent years due to the challenges posed in understanding information coming from multiple modalities (i.e., images, language). In VQA, a series of questions are posed based on a set of images and the task at hand is to arrive at the answer. To achieve this, we take a symbolic reasoning based approach using the framework of formal logic. The image and the questions are converted into symbolic representations on which explicit reasoning is performed. We propose a formal logic framework where (i) images are converted to logical background facts with the help of scene graphs, (ii) the questions are translated to first-order predicate logic clauses using a transformer based deep learning model, and (iii) perform satisfiability checks, by using the background knowledge and the grounding of predicate clauses, to obtain the answer. Our proposed method is highly interpretable and each step in the pipeline can be easily analyzed by a human. We validate our approach on the CLEVR and the GQA dataset. We achieve near perfect accuracy of 99.6% on the CLEVR dataset comparable to the state of art models, showcasing that formal logic is a viable tool to tackle visual question answering. Our model is also data efficient, achieving 99.1% accuracy on CLEVR dataset when trained on just 10% of the training data.

Related papers

A Comprehensive Survey on Visual Question Answering Datasets and Algorithms [1.941892373913038]
We meticulously analyze the current state of VQA datasets and models, while cleanly dividing them into distinct categories and then summarizing the methodologies and characteristics of each category. We explore six main paradigms of VQA models: fusion, attention, the technique of using information from one modality to filter information from another, external knowledge base, composition or reasoning, and graph models.
arXiv Detail & Related papers (2024-11-17T18:52:06Z)
Declarative Knowledge Distillation from Large Language Models for Visual Question Answering Datasets [9.67464173044675]
Visual Question Answering (VQA) is the task of answering a question about an image. We present an approach for declarative knowledge distillation from Large Language Models (LLMs) Our results confirm that distilling knowledge from LLMs is in fact a promising direction besides data-driven rule learning approaches.
arXiv Detail & Related papers (2024-10-12T08:17:03Z)
Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA [9.659820850719413]
We leverage Large Language Models (LLMs), which have shown to have strong reasoning ability, as an automatic data annotator. Key innovation in our method lies in the Synthesize Step-by-Step strategy. We significantly enhance the chart VQA models, achieving the state-of-the-art accuracy on the ChartQA and PlotQA datasets.
arXiv Detail & Related papers (2024-03-25T03:02:27Z)
Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases. We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity) Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z)
Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models [59.05769810380928]
Rephrase, Augment and Reason (RepARe) is a gradient-free framework that extracts salient details about the image using the underlying vision-language model. We show that RepARe can result in a 3.85% (absolute) increase in zero-shot accuracy on VQAv2, 6.41%, and 7.94% points increase on A-OKVQA, and VizWiz respectively.
arXiv Detail & Related papers (2023-10-09T16:57:57Z)
Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering [58.64831511644917]
We introduce an interpretable by design model that factors model decisions into intermediate human-legible explanations. We show that our inherently interpretable system can improve 4.64% over a comparable black-box system in reasoning-focused questions.
arXiv Detail & Related papers (2023-05-24T08:33:15Z)
Neural-Symbolic Models for Logical Queries on Knowledge Graphs [17.290758383645567]
We propose Graph Neural Network Query Executor (GNN-QE), a neural-symbolic model that enjoys the advantages of both worlds. GNN-QE decomposes a complex FOL query into relation projections and logical operations over fuzzy sets. Experiments on 3 datasets show that GNN-QE significantly improves over previous state-of-the-art models in answering FOL queries.
arXiv Detail & Related papers (2022-05-16T18:39:04Z)
HySTER: A Hybrid Spatio-Temporal Event Reasoner [75.41988728376081]
We present the HySTER: a Hybrid Spatio-Temporal Event Reasoner to reason over physical events in videos. We define a method based on general temporal, causal and physics rules which can be transferred across tasks. This work sets the foundations for the incorporation of inductive logic programming in the field of VideoQA.
arXiv Detail & Related papers (2021-01-17T11:07:17Z)
LRTA: A Transparent Neural-Symbolic Reasoning Framework with Modular Supervision for Visual Question Answering [4.602329567377897]
We propose a transparent neural-symbolic reasoning framework for visual question answering. It solves the problem step-by-step like humans and provides human-readable form of justification at each step. Our experiments on GQA dataset show that LRTA outperforms the state-of-the-art model by a large margin.
arXiv Detail & Related papers (2020-11-21T06:39:42Z)
A Simple Approach to Case-Based Reasoning in Knowledge Bases [56.661396189466664]
We present a surprisingly simple yet accurate approach to reasoning in knowledge graphs (KGs) that requires emphno training, and is reminiscent of case-based reasoning in classical artificial intelligence (AI) Consider the task of finding a target entity given a source entity and a binary relation. Our non-parametric approach derives crisp logical rules for each query by finding multiple textitgraph path patterns that connect similar source entities through the given relation.
arXiv Detail & Related papers (2020-06-25T06:28:09Z)
Logic-Guided Data Augmentation and Regularization for Consistent Question Answering [55.05667583529711]
This paper addresses the problem of improving the accuracy and consistency of responses to comparison questions. Our method leverages logical and linguistic knowledge to augment labeled training data and then uses a consistency-based regularizer to train the model.
arXiv Detail & Related papers (2020-04-21T17:03:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.