Related papers: From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering

From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering

URL: http://arxiv.org/abs/2206.12533v1
Date: Sat, 25 Jun 2022 02:20:02 GMT
Title: From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering
Authors: Zihao Zhu
Abstract summary: It is essential to learn to answer deeper questions that require compositional reasoning on the image and external knowledge. We propose a Hierarchical Graph Neural Module Network (HGNMN) that reasons over multi-layer graphs with neural modules. Our model consists of several well-designed neural modules that perform specific functions over graphs.
Score: 3.7094119304085584
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In order to achieve a general visual question answering (VQA) system, it is essential to learn to answer deeper questions that require compositional reasoning on the image and external knowledge. Meanwhile, the reasoning process should be explicit and explainable to understand the working mechanism of the model. It is effortless for human but challenging for machines. In this paper, we propose a Hierarchical Graph Neural Module Network (HGNMN) that reasons over multi-layer graphs with neural modules to address the above issues. Specifically, we first encode the image by multi-layer graphs from the visual, semantic and commonsense views since the clues that support the answer may exist in different modalities. Our model consists of several well-designed neural modules that perform specific functions over graphs, which can be used to conduct multi-step reasoning within and between different graphs. Compared to existing modular networks, we extend visual reasoning from one graph to more graphs. We can explicitly trace the reasoning process according to module weights and graph attentions. Experiments show that our model not only achieves state-of-the-art performance on the CRIC dataset but also obtains explicit and explainable reasoning procedures.

Related papers

Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning [89.17086632436363]
We introduce a synthetic multihop reasoning environment designed to replicate the structure and distribution of real-world large-scale knowledge graphs. Our reasoning task involves completing missing edges in the graph, which requires advanced multi-hop reasoning and mimics real-world reasoning scenarios. To predict the optimal model size for a specific knowledge graph, we find an empirical scaling that linearly maps the knowledge graph search entropy to the optimal model size.
arXiv Detail & Related papers (2025-04-04T17:57:22Z)
InstructG2I: Synthesizing Images from Multimodal Attributed Graphs [50.852150521561676]
We propose a graph context-conditioned diffusion model called InstructG2I. InstructG2I first exploits the graph structure and multimodal information to conduct informative neighbor sampling. A Graph-QFormer encoder adaptively encodes the graph nodes into an auxiliary set of graph prompts to guide the denoising process.
arXiv Detail & Related papers (2024-10-09T17:56:15Z)
G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering [61.93058781222079]
We develop a flexible question-answering framework targeting real-world textual graphs. We introduce the first retrieval-augmented generation (RAG) approach for general textual graphs. G-Retriever performs RAG over a graph by formulating this task as a Prize-Collecting Steiner Tree optimization problem.
arXiv Detail & Related papers (2024-02-12T13:13:04Z)
One for All: Towards Training One Graph Model for All Classification Tasks [61.656962278497225]
A unified model for various graph tasks remains underexplored, primarily due to the challenges unique to the graph learning domain. We propose textbfOne for All (OFA), the first general framework that can use a single graph model to address the above challenges. OFA performs well across different tasks, making it the first general-purpose across-domains classification model on graphs.
arXiv Detail & Related papers (2023-09-29T21:15:26Z)
Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases [63.96793270418793]
Complex logical query answering (CLQA) is a recently emerged task of graph machine learning. We introduce the concept of Neural Graph Database (NGDBs) NGDB consists of a Neural Graph Storage and a Neural Graph Engine.
arXiv Detail & Related papers (2023-03-26T04:03:37Z)
Probing Graph Representations [77.7361299039905]
We use a probing framework to quantify the amount of meaningful information captured in graph representations. Our findings on molecular datasets show the potential of probing for understanding the inductive biases of graph-based models. We advocate for probing as a useful diagnostic tool for evaluating graph-based models.
arXiv Detail & Related papers (2023-03-07T14:58:18Z)
PGX: A Multi-level GNN Explanation Framework Based on Separate Knowledge Distillation Processes [0.2005299372367689]
We propose a multi-level GNN explanation framework based on an observation that GNN is a multimodal learning process of multiple components in graph data. The complexity of the original problem is relaxed by breaking into multiple sub-parts represented as a hierarchical structure. We also aim for personalized explanations as the framework can generate different results based on user preferences.
arXiv Detail & Related papers (2022-08-05T10:14:48Z)
Neural-Symbolic Models for Logical Queries on Knowledge Graphs [17.290758383645567]
We propose Graph Neural Network Query Executor (GNN-QE), a neural-symbolic model that enjoys the advantages of both worlds. GNN-QE decomposes a complex FOL query into relation projections and logical operations over fuzzy sets. Experiments on 3 datasets show that GNN-QE significantly improves over previous state-of-the-art models in answering FOL queries.
arXiv Detail & Related papers (2022-05-16T18:39:04Z)
ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning [65.15423587105472]
We present a new generative and structured commonsense-reasoning task (and an associated dataset) of explanation graph generation for stance prediction. Specifically, given a belief and an argument, a model has to predict whether the argument supports or counters the belief and also generate a commonsense-augmented graph that serves as non-trivial, complete, and unambiguous explanation for the predicted stance. A significant 83% of our graphs contain external commonsense nodes with diverse structures and reasoning depths.
arXiv Detail & Related papers (2021-04-15T17:51:36Z)
Parameterized Explainer for Graph Neural Network [49.79917262156429]
We propose PGExplainer, a parameterized explainer for Graph Neural Networks (GNNs) Compared to the existing work, PGExplainer has better generalization ability and can be utilized in an inductive setting easily. Experiments on both synthetic and real-life datasets show highly competitive performance with up to 24.7% relative improvement in AUC on explaining graph classification.
arXiv Detail & Related papers (2020-11-09T17:15:03Z)
Cross-modal Knowledge Reasoning for Knowledge-based Visual Question Answering [27.042604046441426]
Knowledge-based Visual Question Answering (KVQA) requires external knowledge beyond the visible content to answer questions about an image. In this paper, we depict an image by multiple knowledge graphs from the visual, semantic and factual views. We decompose the model into a series of memory-based reasoning steps, each performed by a G raph-based R ead, U pdate, and C ontrol. We achieve a new state-of-the-art performance on three popular benchmark datasets, including FVQA, Visual7W-KB and OK-VQA.
arXiv Detail & Related papers (2020-08-31T23:25:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.