GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual
Question Answering
- URL: http://arxiv.org/abs/2104.10283v1
- Date: Tue, 20 Apr 2021 23:54:41 GMT
- Title: GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual
Question Answering
- Authors: Weixin Liang, Yanhao Jiang and Zixuan Liu
- Abstract summary: Scene Graph encodes objects as nodes connected via pairwise relations as edges.
We propose GraphVQA, a language-guided graph neural network framework that translates and executes a natural language question.
Our experiments on GQA dataset show that GraphVQA outperforms the state-of-the-art accuracy by a large margin.
- Score: 4.673063715963991
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Images are more than a collection of objects or attributes -- they represent
a web of relationships among interconnected objects. Scene Graph has emerged as
a new modality as a structured graphical representation of images. Scene Graph
encodes objects as nodes connected via pairwise relations as edges. To support
question answering on scene graphs, we propose GraphVQA, a language-guided
graph neural network framework that translates and executes a natural language
question as multiple iterations of message passing among graph nodes. We
explore the design space of GraphVQA framework, and discuss the trade-off of
different design choices. Our experiments on GQA dataset show that GraphVQA
outperforms the state-of-the-art accuracy by a large margin (88.43% vs.
94.78%).
Related papers
- G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering [61.93058781222079]
We develop a flexible question-answering framework targeting real-world textual graphs.
We introduce the first retrieval-augmented generation (RAG) approach for general textual graphs.
G-Retriever performs RAG over a graph by formulating this task as a Prize-Collecting Steiner Tree optimization problem.
arXiv Detail & Related papers (2024-02-12T13:13:04Z) - GraphextQA: A Benchmark for Evaluating Graph-Enhanced Large Language
Models [33.56759621666477]
We present a benchmark dataset for evaluating the integration of graph knowledge into language models.
The proposed dataset is designed to evaluate graph-language models' ability to understand graphs and make use of it for answer generation.
We perform experiments with language-only models and the proposed graph-language model to validate the usefulness of the paired graphs and to demonstrate the difficulty of the task.
arXiv Detail & Related papers (2023-10-12T16:46:58Z) - SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based
Question Answering [0.0]
Scene graphs have emerged as a useful tool for multimodal image analysis.
Current methods that utilize idealized annotated scene graphs struggle to generalize when using predicted scene graphs extracted from images.
Our approach extracts a scene graph from an input image using a pre-trained scene graph generator.
arXiv Detail & Related papers (2023-10-03T07:14:53Z) - Scene Graph Modification as Incremental Structure Expanding [61.84291817776118]
We focus on scene graph modification (SGM), where the system is required to learn how to update an existing scene graph based on a natural language query.
We frame SGM as a graph expansion task by introducing the incremental structure expanding (ISE)
We construct a challenging dataset that contains more complicated queries and larger scene graphs than existing datasets.
arXiv Detail & Related papers (2022-09-15T16:26:14Z) - Interactive Visual Pattern Search on Graph Data via Graph Representation
Learning [20.795511688640296]
We propose a visual analytics system GraphQ to support human-in-the-loop, example-based, subgraph pattern search.
To support fast, interactive queries, we use graph neural networks (GNNs) to encode a graph as fixed-length latent vector representation.
We also propose a novel GNN for node-alignment called NeuroAlign to facilitate easy validation and interpretation of the query results.
arXiv Detail & Related papers (2022-02-18T22:30:28Z) - Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in
Visual Question Answering [71.6781118080461]
We propose a Graph Matching Attention (GMA) network for Visual Question Answering (VQA) task.
firstly, it builds graph for the image, but also constructs graph for the question in terms of both syntactic and embedding information.
Next, we explore the intra-modality relationships by a dual-stage graph encoder and then present a bilateral cross-modality graph matching attention to infer the relationships between the image and the question.
Experiments demonstrate that our network achieves state-of-the-art performance on the GQA dataset and the VQA 2.0 dataset.
arXiv Detail & Related papers (2021-12-14T10:01:26Z) - VLG-Net: Video-Language Graph Matching Network for Video Grounding [57.6661145190528]
Grounding language queries in videos aims at identifying the time interval (or moment) semantically relevant to a language query.
We recast this challenge into an algorithmic graph matching problem.
We demonstrate superior performance over state-of-the-art grounding methods on three widely used datasets.
arXiv Detail & Related papers (2020-11-19T22:32:03Z) - Iterative Context-Aware Graph Inference for Visual Dialog [126.016187323249]
We propose a novel Context-Aware Graph (CAG) neural network.
Each node in the graph corresponds to a joint semantic feature, including both object-based (visual) and history-related (textual) context representations.
arXiv Detail & Related papers (2020-04-05T13:09:37Z) - Bridging Knowledge Graphs to Generate Scene Graphs [49.69377653925448]
We propose a novel graph-based neural network that iteratively propagates information between the two graphs, as well as within each of them.
Our Graph Bridging Network, GB-Net, successively infers edges and nodes, allowing to simultaneously exploit and refine the rich, heterogeneous structure of the interconnected scene and commonsense graphs.
arXiv Detail & Related papers (2020-01-07T23:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.