ORD: Object Relationship Discovery for Visual Dialogue Generation
- URL: http://arxiv.org/abs/2006.08322v1
- Date: Mon, 15 Jun 2020 12:25:40 GMT
- Title: ORD: Object Relationship Discovery for Visual Dialogue Generation
- Authors: Ziwei Wang, Zi Huang, Yadan Luo, Huimin Lu
- Abstract summary: We propose an object relationship discovery (ORD) framework to preserve the object interactions for visual dialogue generation.
A hierarchical graph convolutional network (HierGCN) is proposed to retain the object nodes and neighbour relationships locally, and then refines the object-object connections globally.
Experiments have proved that the proposed method can significantly improve the quality of dialogue by utilising the contextual information of visual relationships.
- Score: 60.471670447176656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid advancement of image captioning and visual question answering
at single-round level, the question of how to generate multi-round dialogue
about visual content has not yet been well explored.Existing visual dialogue
methods encode the image into a fixed feature vector directly, concatenated
with the question and history embeddings to predict the response.Some recent
methods tackle the co-reference resolution problem using co-attention mechanism
to cross-refer relevant elements from the image, history, and the target
question.However, it remains challenging to reason visual relationships, since
the fine-grained object-level information is omitted before co-attentive
reasoning. In this paper, we propose an object relationship discovery (ORD)
framework to preserve the object interactions for visual dialogue generation.
Specifically, a hierarchical graph convolutional network (HierGCN) is proposed
to retain the object nodes and neighbour relationships locally, and then
refines the object-object connections globally to obtain the final graph
embeddings. A graph attention is further incorporated to dynamically attend to
this graph-structured representation at the response reasoning stage. Extensive
experiments have proved that the proposed method can significantly improve the
quality of dialogue by utilising the contextual information of visual
relationships. The model achieves superior performance over the
state-of-the-art methods on the Visual Dialog dataset, increasing MRR from
0.6222 to 0.6447, and recall@1 from 48.48% to 51.22%.
Related papers
- BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation [21.052101309555464]
Multimodal Dialogue Response Generation (MDRG) is a recently proposed task where the model needs to generate responses in texts, images, or a blend of both.
Previous work relies on the text modality as an intermediary step for both the image input and output of the model rather than adopting an end-to-end approach.
We propose BI-MDRG that bridges the response generation path such that the image history information is utilized for enhanced relevance of text responses to the image content.
arXiv Detail & Related papers (2024-08-12T05:22:42Z) - Multi-grained Hypergraph Interest Modeling for Conversational
Recommendation [75.65483522949857]
We propose a novel multi-grained hypergraph interest modeling approach to capture user interest beneath intricate historical data.
In our approach, we first employ the hypergraph structure to model users' historical dialogue sessions and form a session-based hypergraph, which captures coarse-grained, session-level relations.
We further conduct multi-grained hypergraph convolution on the two kinds of hypergraphs, and utilize the enhanced representations to develop interest-aware CRS.
arXiv Detail & Related papers (2023-05-04T13:13:44Z) - Unbiased Heterogeneous Scene Graph Generation with Relation-aware
Message Passing Neural Network [9.779600950401315]
We propose an unbiased heterogeneous scene graph generation (HetSGG) framework that captures relation-aware context.
We devise a novel message passing layer, called relation-aware message passing neural network (RMP), that aggregates the contextual information of an image.
arXiv Detail & Related papers (2022-12-01T11:25:36Z) - Grounding Scene Graphs on Natural Images via Visio-Lingual Message
Passing [17.63475613154152]
This paper presents a framework for jointly grounding objects that follow certain semantic relationship constraints in a scene graph.
A scene graph is an efficient and structured way to represent all the objects and their semantic relationships in the image.
arXiv Detail & Related papers (2022-11-03T16:46:46Z) - Scenes and Surroundings: Scene Graph Generation using Relation
Transformer [13.146732454123326]
This work proposes a novel local-context aware architecture named relation transformer.
Our hierarchical multi-head attention-based approach efficiently captures contextual dependencies between objects and predicts their relationships.
In comparison to state-of-the-art approaches, we have achieved an overall mean textbf4.85% improvement.
arXiv Detail & Related papers (2021-07-12T14:22:20Z) - Learning Reasoning Paths over Semantic Graphs for Video-grounded
Dialogues [73.04906599884868]
We propose a novel framework of Reasoning Paths in Dialogue Context (PDC)
PDC model discovers information flows among dialogue turns through a semantic graph constructed based on lexical components in each question and answer.
Our model sequentially processes both visual and textual information through this reasoning path and the propagated features are used to generate the answer.
arXiv Detail & Related papers (2021-03-01T07:39:26Z) - Dialogue Relation Extraction with Document-level Heterogeneous Graph
Attention Networks [21.409522845011907]
Dialogue relation extraction (DRE) aims to detect the relation between two entities mentioned in a multi-party dialogue.
We present a graph attention network-based method for DRE where a graph contains meaningfully connected speaker, entity, entity-type, and utterance nodes.
We empirically show that this graph-based approach quite effectively captures the relations between different entity pairs in a dialogue as it outperforms the state-of-the-art approaches.
arXiv Detail & Related papers (2020-09-10T18:51:48Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - Dynamic Language Binding in Relational Visual Reasoning [67.85579756590478]
We present Language-binding Object Graph Network, the first neural reasoning method with dynamic relational structures across both visual and textual domains.
Our method outperforms other methods in sophisticated question-answering tasks wherein multiple object relations are involved.
arXiv Detail & Related papers (2020-04-30T06:26:20Z) - Iterative Context-Aware Graph Inference for Visual Dialog [126.016187323249]
We propose a novel Context-Aware Graph (CAG) neural network.
Each node in the graph corresponds to a joint semantic feature, including both object-based (visual) and history-related (textual) context representations.
arXiv Detail & Related papers (2020-04-05T13:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.