Iterative Context-Aware Graph Inference for Visual Dialog
- URL: http://arxiv.org/abs/2004.02194v1
- Date: Sun, 5 Apr 2020 13:09:37 GMT
- Title: Iterative Context-Aware Graph Inference for Visual Dialog
- Authors: Dan Guo, Hui Wang, Hanwang Zhang, Zheng-Jun Zha, Meng Wang
- Abstract summary: We propose a novel Context-Aware Graph (CAG) neural network.
Each node in the graph corresponds to a joint semantic feature, including both object-based (visual) and history-related (textual) context representations.
- Score: 126.016187323249
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual dialog is a challenging task that requires the comprehension of the
semantic dependencies among implicit visual and textual contexts. This task can
refer to the relation inference in a graphical model with sparse contexts and
unknown graph structure (relation descriptor), and how to model the underlying
context-aware relation inference is critical. To this end, we propose a novel
Context-Aware Graph (CAG) neural network. Each node in the graph corresponds to
a joint semantic feature, including both object-based (visual) and
history-related (textual) context representations. The graph structure
(relations in dialog) is iteratively updated using an adaptive top-$K$ message
passing mechanism. Specifically, in every message passing step, each node
selects the most $K$ relevant nodes, and only receives messages from them.
Then, after the update, we impose graph attention on all the nodes to get the
final graph embedding and infer the answer. In CAG, each node has dynamic
relations in the graph (different related $K$ neighbor nodes), and only the
most relevant nodes are attributive to the context-aware relational graph
inference. Experimental results on VisDial v0.9 and v1.0 datasets show that CAG
outperforms comparative methods. Visualization results further validate the
interpretability of our method.
Related papers
- Graph Neural Networks on Discriminative Graphs of Words [19.817473565906777]
In this work, we explore a new Discriminative Graph of Words Graph Neural Network (DGoW-GNN) approach to classify text.
We propose a new model for the graph-based classification of text, which combines a GNN and a sequence model.
We evaluate our approach on seven benchmark datasets and find that it is outperformed by several state-of-the-art baseline models.
arXiv Detail & Related papers (2024-10-27T15:14:06Z) - G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering [61.93058781222079]
We develop a flexible question-answering framework targeting real-world textual graphs.
We introduce the first retrieval-augmented generation (RAG) approach for general textual graphs.
G-Retriever performs RAG over a graph by formulating this task as a Prize-Collecting Steiner Tree optimization problem.
arXiv Detail & Related papers (2024-02-12T13:13:04Z) - ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings [20.25180279903009]
We propose Contrastive Graph-Text pretraining (ConGraT) for jointly learning separate representations of texts and nodes in a text-attributed graph (TAG)
Our method trains a language model (LM) and a graph neural network (GNN) to align their representations in a common latent space using a batch-wise contrastive learning objective inspired by CLIP.
Experiments demonstrate that ConGraT outperforms baselines on various downstream tasks, including node and text category classification, link prediction, and language modeling.
arXiv Detail & Related papers (2023-05-23T17:53:30Z) - Conversational Semantic Parsing using Dynamic Context Graphs [68.72121830563906]
We consider the task of conversational semantic parsing over general purpose knowledge graphs (KGs) with millions of entities, and thousands of relation-types.
We focus on models which are capable of interactively mapping user utterances into executable logical forms.
arXiv Detail & Related papers (2023-05-04T16:04:41Z) - Scene Graph Modification as Incremental Structure Expanding [61.84291817776118]
We focus on scene graph modification (SGM), where the system is required to learn how to update an existing scene graph based on a natural language query.
We frame SGM as a graph expansion task by introducing the incremental structure expanding (ISE)
We construct a challenging dataset that contains more complicated queries and larger scene graphs than existing datasets.
arXiv Detail & Related papers (2022-09-15T16:26:14Z) - GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual
Question Answering [4.673063715963991]
Scene Graph encodes objects as nodes connected via pairwise relations as edges.
We propose GraphVQA, a language-guided graph neural network framework that translates and executes a natural language question.
Our experiments on GQA dataset show that GraphVQA outperforms the state-of-the-art accuracy by a large margin.
arXiv Detail & Related papers (2021-04-20T23:54:41Z) - Learning the Implicit Semantic Representation on Graph-Structured Data [57.670106959061634]
Existing representation learning methods in graph convolutional networks are mainly designed by describing the neighborhood of each node as a perceptual whole.
We propose a Semantic Graph Convolutional Networks (SGCN) that explores the implicit semantics by learning latent semantic-paths in graphs.
arXiv Detail & Related papers (2021-01-16T16:18:43Z) - Graph Structured Network for Image-Text Matching [127.68148793548116]
We present a novel Graph Structured Matching Network to learn fine-grained correspondence.
The GSMN explicitly models object, relation and attribute as a structured phrase.
Experiments show that GSMN outperforms state-of-the-art methods on benchmarks.
arXiv Detail & Related papers (2020-04-01T08:20:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.