GoG: Relation-aware Graph-over-Graph Network for Visual Dialog
- URL: http://arxiv.org/abs/2109.08475v1
- Date: Fri, 17 Sep 2021 11:37:33 GMT
- Title: GoG: Relation-aware Graph-over-Graph Network for Visual Dialog
- Authors: Feilong Chen, Xiuyi Chen, Fandong Meng, Peng Li, Jie Zhou
- Abstract summary: Graph neural networks are recently applied to model the implicit relations between objects in an image or dialog.
We propose a novel relation-aware graph-over-graph network (GoG) for visual dialog.
GoG consists of three sequential graphs: 1) H-Graph, which aims to capture coreference relations among dialog history; 2) History-aware Q-Graph, which aims to fully understand the question through capturing dependency relations between words based on coreference resolution on the dialog history; and 3) Question-aware I-Graph, which aims to capture the relations between objects in an image based on fully question representation.
- Score: 25.57530524167637
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual dialog, which aims to hold a meaningful conversation with humans about
a given image, is a challenging task that requires models to reason the complex
dependencies among visual content, dialog history, and current questions. Graph
neural networks are recently applied to model the implicit relations between
objects in an image or dialog. However, they neglect the importance of 1)
coreference relations among dialog history and dependency relations between
words for the question representation; and 2) the representation of the image
based on the fully represented question. Therefore, we propose a novel
relation-aware graph-over-graph network (GoG) for visual dialog. Specifically,
GoG consists of three sequential graphs: 1) H-Graph, which aims to capture
coreference relations among dialog history; 2) History-aware Q-Graph, which
aims to fully understand the question through capturing dependency relations
between words based on coreference resolution on the dialog history; and 3)
Question-aware I-Graph, which aims to capture the relations between objects in
an image based on fully question representation. As an additional feature
representation module, we add GoG to the existing visual dialogue model.
Experimental results show that our model outperforms the strong baseline in
both generative and discriminative settings by a significant margin.
Related papers
- Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference [107.53380946417003]
We propose a novel learning paradigm to generate visual questions with answer-awareness and region-reference.
We develop a simple methodology to self-learn the visual hints without introducing any additional human annotations.
arXiv Detail & Related papers (2024-07-06T15:07:32Z) - G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering [61.93058781222079]
We develop a flexible question-answering framework targeting real-world textual graphs.
We introduce the first retrieval-augmented generation (RAG) approach for general textual graphs.
G-Retriever performs RAG over a graph by formulating this task as a Prize-Collecting Steiner Tree optimization problem.
arXiv Detail & Related papers (2024-02-12T13:13:04Z) - Conversational Semantic Parsing using Dynamic Context Graphs [68.72121830563906]
We consider the task of conversational semantic parsing over general purpose knowledge graphs (KGs) with millions of entities, and thousands of relation-types.
We focus on models which are capable of interactively mapping user utterances into executable logical forms.
arXiv Detail & Related papers (2023-05-04T16:04:41Z) - Image Semantic Relation Generation [0.76146285961466]
Scene graphs can distil complex image information and correct the bias of visual models using semantic-level relations.
In this work, we introduce image semantic relation generation (ISRG), a simple but effective image-to-text model.
arXiv Detail & Related papers (2022-10-19T16:15:19Z) - GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual
Question Answering [4.673063715963991]
Scene Graph encodes objects as nodes connected via pairwise relations as edges.
We propose GraphVQA, a language-guided graph neural network framework that translates and executes a natural language question.
Our experiments on GQA dataset show that GraphVQA outperforms the state-of-the-art accuracy by a large margin.
arXiv Detail & Related papers (2021-04-20T23:54:41Z) - Dialogue Relation Extraction with Document-level Heterogeneous Graph
Attention Networks [21.409522845011907]
Dialogue relation extraction (DRE) aims to detect the relation between two entities mentioned in a multi-party dialogue.
We present a graph attention network-based method for DRE where a graph contains meaningfully connected speaker, entity, entity-type, and utterance nodes.
We empirically show that this graph-based approach quite effectively captures the relations between different entity pairs in a dialogue as it outperforms the state-of-the-art approaches.
arXiv Detail & Related papers (2020-09-10T18:51:48Z) - ORD: Object Relationship Discovery for Visual Dialogue Generation [60.471670447176656]
We propose an object relationship discovery (ORD) framework to preserve the object interactions for visual dialogue generation.
A hierarchical graph convolutional network (HierGCN) is proposed to retain the object nodes and neighbour relationships locally, and then refines the object-object connections globally.
Experiments have proved that the proposed method can significantly improve the quality of dialogue by utilising the contextual information of visual relationships.
arXiv Detail & Related papers (2020-06-15T12:25:40Z) - Iterative Context-Aware Graph Inference for Visual Dialog [126.016187323249]
We propose a novel Context-Aware Graph (CAG) neural network.
Each node in the graph corresponds to a joint semantic feature, including both object-based (visual) and history-related (textual) context representations.
arXiv Detail & Related papers (2020-04-05T13:09:37Z) - Modality-Balanced Models for Visual Dialogue [102.35406085738325]
The Visual Dialog task requires a model to exploit both image and conversational context information to generate the next response to the dialogue.
We show that previous joint-modality (history and image) models over-rely on and are more prone to memorizing the dialogue history.
We present methods for this integration of the two models, via ensemble and consensus dropout fusion with shared parameters.
arXiv Detail & Related papers (2020-01-17T14:57:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.