KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning
in Visual Dialogue
- URL: http://arxiv.org/abs/2008.04858v2
- Date: Fri, 28 Aug 2020 07:34:49 GMT
- Title: KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning
in Visual Dialogue
- Authors: Xiaoze Jiang, Siyi Du, Zengchang Qin, Yajing Sun, Jing Yu
- Abstract summary: We propose a novel Knowledge-Bridge Graph Network (KBGN) model to bridge the cross-modal semantic relations between vision and text knowledge.
We show that our model outperforms existing models with state-of-the-art results.
- Score: 17.119682693725718
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual dialogue is a challenging task that needs to extract implicit
information from both visual (image) and textual (dialogue history) contexts.
Classical approaches pay more attention to the integration of the current
question, vision knowledge and text knowledge, despising the heterogeneous
semantic gaps between the cross-modal information. In the meantime, the
concatenation operation has become de-facto standard to the cross-modal
information fusion, which has a limited ability in information retrieval. In
this paper, we propose a novel Knowledge-Bridge Graph Network (KBGN) model by
using graph to bridge the cross-modal semantic relations between vision and
text knowledge in fine granularity, as well as retrieving required knowledge
via an adaptive information selection mode. Moreover, the reasoning clues for
visual dialogue can be clearly drawn from intra-modal entities and inter-modal
bridges. Experimental results on VisDial v1.0 and VisDial-Q datasets
demonstrate that our model outperforms existing models with state-of-the-art
results.
Related papers
- Bridging Local Details and Global Context in Text-Attributed Graphs [62.522550655068336]
GraphBridge is a framework that bridges local and global perspectives by leveraging contextual textual information.
Our method achieves state-of-theart performance, while our graph-aware token reduction module significantly enhances efficiency and solves scalability issues.
arXiv Detail & Related papers (2024-06-18T13:35:25Z) - Knowledge Graphs and Pre-trained Language Models enhanced Representation Learning for Conversational Recommender Systems [58.561904356651276]
We introduce the Knowledge-Enhanced Entity Representation Learning (KERL) framework to improve the semantic understanding of entities for Conversational recommender systems.
KERL uses a knowledge graph and a pre-trained language model to improve the semantic understanding of entities.
KERL achieves state-of-the-art results in both recommendation and response generation tasks.
arXiv Detail & Related papers (2023-12-18T06:41:23Z) - Improving In-Context Learning in Diffusion Models with Visual
Context-Modulated Prompts [83.03471704115786]
We introduce improved Prompt Diffusion (iPromptDiff) in this study.
iPromptDiff integrates an end-to-end trained vision encoder that converts visual context into an embedding vector.
We show that a diffusion-based vision foundation model, when equipped with this visual context-modulated text guidance and a standard ControlNet structure, exhibits versatility and robustness across a variety of training tasks.
arXiv Detail & Related papers (2023-12-03T14:15:52Z) - ReSee: Responding through Seeing Fine-grained Visual Knowledge in
Open-domain Dialogue [34.223466503256766]
We provide a new paradigm of constructing multimodal dialogues by splitting visual knowledge into finer granularity.
To boost the accuracy and diversity of augmented visual information, we retrieve them from the Internet or a large image dataset.
By leveraging text and vision knowledge, ReSee can produce informative responses with real-world visual concepts.
arXiv Detail & Related papers (2023-05-23T02:08:56Z) - CADGE: Context-Aware Dialogue Generation Enhanced with Graph-Structured Knowledge Aggregation [25.56539617837482]
A novel context-aware graph-attention model (Context-aware GAT) is proposed.
It assimilates global features from relevant knowledge graphs through a context-enhanced knowledge aggregation mechanism.
Empirical results demonstrate that our framework outperforms conventional GNN-based language models in terms of performance.
arXiv Detail & Related papers (2023-05-10T16:31:35Z) - Building Knowledge-Grounded Dialogue Systems with Graph-Based Semantic Modeling [43.0554223015728]
The knowledge-grounded dialogue task aims to generate responses that convey information from given knowledge documents.
We propose a novel graph structure, Grounded Graph, that models the semantic structure of both dialogue and knowledge.
We also propose a Grounded Graph Aware Transformer to enhance knowledge-grounded response generation.
arXiv Detail & Related papers (2022-04-27T03:31:46Z) - Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog [12.034554338597067]
We propose a novel model by Reasoning with Multi-structure Commonsense Knowledge (RMK)
In our model, the external knowledge is represented with sentence-level facts and graph-level facts.
On top of these multi-structure representations, our model can capture relevant knowledge and incorporate them into the vision and semantic features.
arXiv Detail & Related papers (2022-04-10T13:12:10Z) - Learning Reasoning Paths over Semantic Graphs for Video-grounded
Dialogues [73.04906599884868]
We propose a novel framework of Reasoning Paths in Dialogue Context (PDC)
PDC model discovers information flows among dialogue turns through a semantic graph constructed based on lexical components in each question and answer.
Our model sequentially processes both visual and textual information through this reasoning path and the propagated features are used to generate the answer.
arXiv Detail & Related papers (2021-03-01T07:39:26Z) - GraphDialog: Integrating Graph Knowledge into End-to-End Task-Oriented
Dialogue Systems [9.560436630775762]
End-to-end task-oriented dialogue systems aim to generate system responses directly from plain text inputs.
One is how to effectively incorporate external knowledge bases (KBs) into the learning framework; the other is how to accurately capture the semantics of dialogue history.
We address these two challenges by exploiting the graph structural information in the knowledge base and in the dependency parsing tree of the dialogue.
arXiv Detail & Related papers (2020-10-04T00:04:40Z) - ORD: Object Relationship Discovery for Visual Dialogue Generation [60.471670447176656]
We propose an object relationship discovery (ORD) framework to preserve the object interactions for visual dialogue generation.
A hierarchical graph convolutional network (HierGCN) is proposed to retain the object nodes and neighbour relationships locally, and then refines the object-object connections globally.
Experiments have proved that the proposed method can significantly improve the quality of dialogue by utilising the contextual information of visual relationships.
arXiv Detail & Related papers (2020-06-15T12:25:40Z) - Exploiting Structured Knowledge in Text via Graph-Guided Representation
Learning [73.0598186896953]
We present two self-supervised tasks learning over raw text with the guidance from knowledge graphs.
Building upon entity-level masked language models, our first contribution is an entity masking scheme.
In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training.
arXiv Detail & Related papers (2020-04-29T14:22:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.