From Shallow to Deep: Compositional Reasoning over Graphs for Visual
Question Answering
- URL: http://arxiv.org/abs/2206.12533v1
- Date: Sat, 25 Jun 2022 02:20:02 GMT
- Title: From Shallow to Deep: Compositional Reasoning over Graphs for Visual
Question Answering
- Authors: Zihao Zhu
- Abstract summary: It is essential to learn to answer deeper questions that require compositional reasoning on the image and external knowledge.
We propose a Hierarchical Graph Neural Module Network (HGNMN) that reasons over multi-layer graphs with neural modules.
Our model consists of several well-designed neural modules that perform specific functions over graphs.
- Score: 3.7094119304085584
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In order to achieve a general visual question answering (VQA) system, it is
essential to learn to answer deeper questions that require compositional
reasoning on the image and external knowledge. Meanwhile, the reasoning process
should be explicit and explainable to understand the working mechanism of the
model. It is effortless for human but challenging for machines. In this paper,
we propose a Hierarchical Graph Neural Module Network (HGNMN) that reasons over
multi-layer graphs with neural modules to address the above issues.
Specifically, we first encode the image by multi-layer graphs from the visual,
semantic and commonsense views since the clues that support the answer may
exist in different modalities. Our model consists of several well-designed
neural modules that perform specific functions over graphs, which can be used
to conduct multi-step reasoning within and between different graphs. Compared
to existing modular networks, we extend visual reasoning from one graph to more
graphs. We can explicitly trace the reasoning process according to module
weights and graph attentions. Experiments show that our model not only achieves
state-of-the-art performance on the CRIC dataset but also obtains explicit and
explainable reasoning procedures.
Related papers
- InstructG2I: Synthesizing Images from Multimodal Attributed Graphs [50.852150521561676]
We propose a graph context-conditioned diffusion model called InstructG2I.
InstructG2I first exploits the graph structure and multimodal information to conduct informative neighbor sampling.
A Graph-QFormer encoder adaptively encodes the graph nodes into an auxiliary set of graph prompts to guide the denoising process.
arXiv Detail & Related papers (2024-10-09T17:56:15Z) - G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering [61.93058781222079]
We develop a flexible question-answering framework targeting real-world textual graphs.
We introduce the first retrieval-augmented generation (RAG) approach for general textual graphs.
G-Retriever performs RAG over a graph by formulating this task as a Prize-Collecting Steiner Tree optimization problem.
arXiv Detail & Related papers (2024-02-12T13:13:04Z) - Neural Graph Reasoning: Complex Logical Query Answering Meets Graph
Databases [63.96793270418793]
Complex logical query answering (CLQA) is a recently emerged task of graph machine learning.
We introduce the concept of Neural Graph Database (NGDBs)
NGDB consists of a Neural Graph Storage and a Neural Graph Engine.
arXiv Detail & Related papers (2023-03-26T04:03:37Z) - Probing Graph Representations [77.7361299039905]
We use a probing framework to quantify the amount of meaningful information captured in graph representations.
Our findings on molecular datasets show the potential of probing for understanding the inductive biases of graph-based models.
We advocate for probing as a useful diagnostic tool for evaluating graph-based models.
arXiv Detail & Related papers (2023-03-07T14:58:18Z) - PGX: A Multi-level GNN Explanation Framework Based on Separate Knowledge
Distillation Processes [0.2005299372367689]
We propose a multi-level GNN explanation framework based on an observation that GNN is a multimodal learning process of multiple components in graph data.
The complexity of the original problem is relaxed by breaking into multiple sub-parts represented as a hierarchical structure.
We also aim for personalized explanations as the framework can generate different results based on user preferences.
arXiv Detail & Related papers (2022-08-05T10:14:48Z) - Neural-Symbolic Models for Logical Queries on Knowledge Graphs [17.290758383645567]
We propose Graph Neural Network Query Executor (GNN-QE), a neural-symbolic model that enjoys the advantages of both worlds.
GNN-QE decomposes a complex FOL query into relation projections and logical operations over fuzzy sets.
Experiments on 3 datasets show that GNN-QE significantly improves over previous state-of-the-art models in answering FOL queries.
arXiv Detail & Related papers (2022-05-16T18:39:04Z) - ExplaGraphs: An Explanation Graph Generation Task for Structured
Commonsense Reasoning [65.15423587105472]
We present a new generative and structured commonsense-reasoning task (and an associated dataset) of explanation graph generation for stance prediction.
Specifically, given a belief and an argument, a model has to predict whether the argument supports or counters the belief and also generate a commonsense-augmented graph that serves as non-trivial, complete, and unambiguous explanation for the predicted stance.
A significant 83% of our graphs contain external commonsense nodes with diverse structures and reasoning depths.
arXiv Detail & Related papers (2021-04-15T17:51:36Z) - Parameterized Explainer for Graph Neural Network [49.79917262156429]
We propose PGExplainer, a parameterized explainer for Graph Neural Networks (GNNs)
Compared to the existing work, PGExplainer has better generalization ability and can be utilized in an inductive setting easily.
Experiments on both synthetic and real-life datasets show highly competitive performance with up to 24.7% relative improvement in AUC on explaining graph classification.
arXiv Detail & Related papers (2020-11-09T17:15:03Z) - Cross-modal Knowledge Reasoning for Knowledge-based Visual Question
Answering [27.042604046441426]
Knowledge-based Visual Question Answering (KVQA) requires external knowledge beyond the visible content to answer questions about an image.
In this paper, we depict an image by multiple knowledge graphs from the visual, semantic and factual views.
We decompose the model into a series of memory-based reasoning steps, each performed by a G raph-based R ead, U pdate, and C ontrol.
We achieve a new state-of-the-art performance on three popular benchmark datasets, including FVQA, Visual7W-KB and OK-VQA.
arXiv Detail & Related papers (2020-08-31T23:25:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.