Understanding the Role of Scene Graphs in Visual Question Answering
- URL: http://arxiv.org/abs/2101.05479v2
- Date: Sun, 17 Jan 2021 04:17:07 GMT
- Title: Understanding the Role of Scene Graphs in Visual Question Answering
- Authors: Vinay Damodaran, Sharanya Chakravarthy, Akshay Kumar, Anjana Umapathy,
Teruko Mitamura, Yuta Nakashima, Noa Garcia, Chenhui Chu
- Abstract summary: We conduct experiments on the GQA dataset which presents a challenging set of questions requiring counting, compositionality and advanced reasoning capability.
We adopt image + question architectures for use with scene graphs, evaluate various scene graph generation techniques for unseen images, propose a training curriculum to leverage human-annotated and auto-generated scene graphs.
We present a multi-faceted study into the use of scene graphs for Visual Question Answering, making this work the first of its kind.
- Score: 26.02889386248289
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Visual Question Answering (VQA) is of tremendous interest to the research
community with important applications such as aiding visually impaired users
and image-based search. In this work, we explore the use of scene graphs for
solving the VQA task. We conduct experiments on the GQA dataset which presents
a challenging set of questions requiring counting, compositionality and
advanced reasoning capability, and provides scene graphs for a large number of
images. We adopt image + question architectures for use with scene graphs,
evaluate various scene graph generation techniques for unseen images, propose a
training curriculum to leverage human-annotated and auto-generated scene
graphs, and build late fusion architectures to learn from multiple image
representations. We present a multi-faceted study into the use of scene graphs
for VQA, making this work the first of its kind.
Related papers
- Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference [107.53380946417003]
We propose a novel learning paradigm to generate visual questions with answer-awareness and region-reference.
We develop a simple methodology to self-learn the visual hints without introducing any additional human annotations.
arXiv Detail & Related papers (2024-07-06T15:07:32Z) - G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering [61.93058781222079]
We develop a flexible question-answering framework targeting real-world textual graphs.
We introduce the first retrieval-augmented generation (RAG) approach for general textual graphs.
G-Retriever performs RAG over a graph by formulating this task as a Prize-Collecting Steiner Tree optimization problem.
arXiv Detail & Related papers (2024-02-12T13:13:04Z) - SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based
Question Answering [0.0]
Scene graphs have emerged as a useful tool for multimodal image analysis.
Current methods that utilize idealized annotated scene graphs struggle to generalize when using predicted scene graphs extracted from images.
Our approach extracts a scene graph from an input image using a pre-trained scene graph generator.
arXiv Detail & Related papers (2023-10-03T07:14:53Z) - Learning Situation Hyper-Graphs for Video Question Answering [95.18071873415556]
We propose an architecture for Video Question Answering (VQA) that enables answering questions related to video content by predicting situation hyper-graphs.
We train a situation hyper-graph decoder to implicitly identify graph representations with actions and object/human-object relationships from the input video clip.
Our results show that learning the underlying situation hyper-graphs helps the system to significantly improve its performance for novel challenges of video question-answering tasks.
arXiv Detail & Related papers (2023-04-18T01:23:11Z) - OG-SGG: Ontology-Guided Scene Graph Generation. A Case Study in Transfer
Learning for Telepresence Robotics [124.08684545010664]
Scene graph generation from images is a task of great interest to applications such as robotics.
We propose an initial approximation to a framework called Ontology-Guided Scene Graph Generation (OG-SGG)
arXiv Detail & Related papers (2022-02-21T13:23:15Z) - SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense
Reasoning [61.57887011165744]
multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning.
We propose a Scene Graph Enhanced Image-Text Learning framework to incorporate visual scene graphs in commonsense reasoning.
arXiv Detail & Related papers (2021-12-16T03:16:30Z) - Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in
Visual Question Answering [71.6781118080461]
We propose a Graph Matching Attention (GMA) network for Visual Question Answering (VQA) task.
firstly, it builds graph for the image, but also constructs graph for the question in terms of both syntactic and embedding information.
Next, we explore the intra-modality relationships by a dual-stage graph encoder and then present a bilateral cross-modality graph matching attention to infer the relationships between the image and the question.
Experiments demonstrate that our network achieves state-of-the-art performance on the GQA dataset and the VQA 2.0 dataset.
arXiv Detail & Related papers (2021-12-14T10:01:26Z) - Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question
Answering [13.886692497676659]
Graphhopper is a novel method that approaches the task by integrating knowledge graph reasoning, computer vision, and natural language processing techniques.
We derive a scene graph that describes the objects in the image, as well as their attributes and their mutual relationships.
A reinforcement learning agent is trained to autonomously navigate in a multi-hop manner over the extracted scene graph to generate reasoning paths.
arXiv Detail & Related papers (2021-07-13T18:33:04Z) - A Comprehensive Survey of Scene Graphs: Generation and Application [42.07469181785126]
Scene graph is a structured representation of a scene that can clearly express the objects, attributes, and relationships between objects in the scene.
No relatively systematic survey of scene graphs exists at present.
arXiv Detail & Related papers (2021-03-17T04:24:20Z) - Scene Graph Reasoning for Visual Question Answering [23.57543808056452]
We propose a novel method that approaches the task by performing context-driven, sequential reasoning based on the objects and their semantic and spatial relationships present in the scene.
A reinforcement agent then learns to autonomously navigate over the extracted scene graph to generate paths, which are then the basis for deriving answers.
arXiv Detail & Related papers (2020-07-02T13:02:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.