Joint learning of object graph and relation graph for visual question
answering
- URL: http://arxiv.org/abs/2205.04188v1
- Date: Mon, 9 May 2022 11:08:43 GMT
- Title: Joint learning of object graph and relation graph for visual question
answering
- Authors: Hao Li, Xu Li, Belhal Karimi, Jie Chen, Mingming Sun
- Abstract summary: We introduce a novel Dual Message-passing enhanced Graph Neural Network (DM-GNN)
DM-GNN can obtain a balanced representation by properly encoding multi-scale scene graph information.
We conduct extensive experiments on datasets including GQA, VG, motif-VG and achieve new state of the art.
- Score: 19.97265717398179
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modeling visual question answering(VQA) through scene graphs can
significantly improve the reasoning accuracy and interpretability. However,
existing models answer poorly for complex reasoning questions with attributes
or relations, which causes false attribute selection or missing relation in
Figure 1(a). It is because these models cannot balance all kinds of information
in scene graphs, neglecting relation and attribute information. In this paper,
we introduce a novel Dual Message-passing enhanced Graph Neural Network
(DM-GNN), which can obtain a balanced representation by properly encoding
multi-scale scene graph information. Specifically, we (i)transform the scene
graph into two graphs with diversified focuses on objects and relations; Then
we design a dual structure to encode them, which increases the weights from
relations (ii)fuse the encoder output with attribute features, which increases
the weights from attributes; (iii)propose a message-passing mechanism to
enhance the information transfer between objects, relations and attributes. We
conduct extensive experiments on datasets including GQA, VG, motif-VG and
achieve new state of the art.
Related papers
- Composing Object Relations and Attributes for Image-Text Matching [70.47747937665987]
This work introduces a dual-encoder image-text matching model, leveraging a scene graph to represent captions with nodes for objects and attributes interconnected by relational edges.
Our model efficiently encodes object-attribute and object-object semantic relations, resulting in a robust and fast-performing system.
arXiv Detail & Related papers (2024-06-17T17:56:01Z) - Higher Order Structures For Graph Explanations [9.164945693135959]
We present Framework For Higher-Order Representations In Graph Explanations (FORGE)
FORGE enables graph explainers to capture higher-order, multi-node interactions.
It improves average explanation accuracy by 1.9x and 2.25x, respectively.
arXiv Detail & Related papers (2024-06-05T13:31:30Z) - Relation Rectification in Diffusion Model [64.84686527988809]
We introduce a novel task termed Relation Rectification, aiming to refine the model to accurately represent a given relationship it initially fails to generate.
We propose an innovative solution utilizing a Heterogeneous Graph Convolutional Network (HGCN)
The lightweight HGCN adjusts the text embeddings generated by the text encoder, ensuring the accurate reflection of the textual relation in the embedding space.
arXiv Detail & Related papers (2024-03-29T15:54:36Z) - Relation-Aware Question Answering for Heterogeneous Knowledge Graphs [37.38138785470231]
Existing retrieval-based approaches solve this task by concentrating on the specific relation at different hops.
We claim they fail to utilize information from head-tail entities and the semantic connection between relations to enhance the current relation representation.
Our approach achieves a significant performance gain over the prior state-of-the-art.
arXiv Detail & Related papers (2023-12-19T08:01:48Z) - GraphextQA: A Benchmark for Evaluating Graph-Enhanced Large Language
Models [33.56759621666477]
We present a benchmark dataset for evaluating the integration of graph knowledge into language models.
The proposed dataset is designed to evaluate graph-language models' ability to understand graphs and make use of it for answer generation.
We perform experiments with language-only models and the proposed graph-language model to validate the usefulness of the paired graphs and to demonstrate the difficulty of the task.
arXiv Detail & Related papers (2023-10-12T16:46:58Z) - Probing Graph Representations [77.7361299039905]
We use a probing framework to quantify the amount of meaningful information captured in graph representations.
Our findings on molecular datasets show the potential of probing for understanding the inductive biases of graph-based models.
We advocate for probing as a useful diagnostic tool for evaluating graph-based models.
arXiv Detail & Related papers (2023-03-07T14:58:18Z) - Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in
Visual Question Answering [71.6781118080461]
We propose a Graph Matching Attention (GMA) network for Visual Question Answering (VQA) task.
firstly, it builds graph for the image, but also constructs graph for the question in terms of both syntactic and embedding information.
Next, we explore the intra-modality relationships by a dual-stage graph encoder and then present a bilateral cross-modality graph matching attention to infer the relationships between the image and the question.
Experiments demonstrate that our network achieves state-of-the-art performance on the GQA dataset and the VQA 2.0 dataset.
arXiv Detail & Related papers (2021-12-14T10:01:26Z) - Dual ResGCN for Balanced Scene GraphGeneration [106.7828712878278]
We propose a novel model, dubbed textitdual ResGCN, which consists of an object residual graph convolutional network and a relation residual graph convolutional network.
The two networks are complementary to each other. The former captures object-level context information, textiti.e., the connections among objects.
The latter is carefully designed to explicitly capture relation-level context information textiti.e., the connections among relations.
arXiv Detail & Related papers (2020-11-09T07:44:17Z) - CopulaGNN: Towards Integrating Representational and Correlational Roles
of Graphs in Graph Neural Networks [23.115288017590093]
We investigate how Graph Neural Network (GNN) models can effectively leverage both types of information.
The proposed Copula Graph Neural Network (CopulaGNN) can take a wide range of GNN models as base models.
arXiv Detail & Related papers (2020-10-05T15:20:04Z) - Jointly Cross- and Self-Modal Graph Attention Network for Query-Based
Moment Localization [77.21951145754065]
We propose a novel Cross- and Self-Modal Graph Attention Network (CSMGAN) that recasts this task as a process of iterative messages passing over a joint graph.
Our CSMGAN is able to effectively capture high-order interactions between two modalities, thus enabling a further precise localization.
arXiv Detail & Related papers (2020-08-04T08:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.