Dual Semantic Knowledge Composed Multimodal Dialog Systems
- URL: http://arxiv.org/abs/2305.09990v1
- Date: Wed, 17 May 2023 06:33:26 GMT
- Title: Dual Semantic Knowledge Composed Multimodal Dialog Systems
- Authors: Xiaolin Chen, Xuemeng Song, Yinwei Wei, Liqiang Nie, Tat-Seng Chua
- Abstract summary: We propose a novel multimodal task-oriented dialog system named MDS-S2.
It acquires the context related attribute and relation knowledge from the knowledge base.
We also devise a set of latent query variables to distill the semantic information from the composed response representation.
- Score: 114.52730430047589
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Textual response generation is an essential task for multimodal task-oriented
dialog systems.Although existing studies have achieved fruitful progress, they
still suffer from two critical limitations: 1) focusing on the attribute
knowledge but ignoring the relation knowledge that can reveal the correlations
between different entities and hence promote the response generation}, and 2)
only conducting the cross-entropy loss based output-level supervision but
lacking the representation-level regularization. To address these limitations,
we devise a novel multimodal task-oriented dialog system (named MDS-S2).
Specifically, MDS-S2 first simultaneously acquires the context related
attribute and relation knowledge from the knowledge base, whereby the
non-intuitive relation knowledge is extracted by the n-hop graph walk.
Thereafter, considering that the attribute knowledge and relation knowledge can
benefit the responding to different levels of questions, we design a
multi-level knowledge composition module in MDS-S2 to obtain the latent
composed response representation. Moreover, we devise a set of latent query
variables to distill the semantic information from the composed response
representation and the ground truth response representation, respectively, and
thus conduct the representation-level semantic regularization. Extensive
experiments on a public dataset have verified the superiority of our proposed
MDS-S2. We have released the codes and parameters to facilitate the research
community.
Related papers
- UniMS-RAG: A Unified Multi-source Retrieval-Augmented Generation for
Personalized Dialogue Systems [44.893215129952395]
Large Language Models (LLMs) has shown exceptional capabilities in many natual language understanding and generation tasks.
We decompose the use of multiple sources in generating personalized response into three sub-tasks: Knowledge Source Selection, Knowledge Retrieval, and Response Generation.
We propose a novel Unified Multi-Source Retrieval-Augmented Generation system (UniMS-RAG)
arXiv Detail & Related papers (2024-01-24T06:50:20Z) - Generative Multi-Modal Knowledge Retrieval with Large Language Models [75.70313858231833]
We propose an innovative end-to-end generative framework for multi-modal knowledge retrieval.
Our framework takes advantage of the fact that large language models (LLMs) can effectively serve as virtual knowledge bases.
We demonstrate significant improvements ranging from 3.0% to 14.6% across all evaluation metrics when compared to strong baselines.
arXiv Detail & Related papers (2024-01-16T08:44:29Z) - Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual
Question Answering [32.21000330743921]
We propose a novel framework that endows the model with capabilities of answering more general questions.
Specifically, a well-defined detector is adopted to predict image-question related relation phrases.
The optimal answer is predicted by choosing the supporting fact with the highest score.
arXiv Detail & Related papers (2023-12-20T02:35:18Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Diverse and Faithful Knowledge-Grounded Dialogue Generation via
Sequential Posterior Inference [82.28542500317445]
We present an end-to-end learning framework, termed Sequential Posterior Inference (SPI), capable of selecting knowledge and generating dialogues.
Unlike other methods, SPI does not require the inference network or assume a simple geometry of the posterior distribution.
arXiv Detail & Related papers (2023-06-01T21:23:13Z) - Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model [63.461030694700014]
We propose a novel dual knowledge-enhanced generative pretrained language model for multimodal task-oriented dialog systems (DKMD)
The proposed DKMD consists of three key components: dual knowledge selection, dual knowledge-enhanced context learning, and knowledge-enhanced response generation.
Experiments on a public dataset verify the superiority of the proposed DKMD over state-of-the-art competitors.
arXiv Detail & Related papers (2022-07-16T13:02:54Z) - M2R2: Missing-Modality Robust emotion Recognition framework with
iterative data augmentation [6.962213869946514]
We propose Missing-Modality Robust emotion Recognition (M2R2), which trains emotion recognition model with iterative data augmentation by learned common representation.
Party Attentive Network (PANet) is designed to classify emotions, which tracks all the speakers' states and context.
arXiv Detail & Related papers (2022-05-05T09:16:31Z) - Knowledge Augmented BERT Mutual Network in Multi-turn Spoken Dialogues [6.4144180888492075]
We propose to equip a BERT-based joint model with a knowledge attention module to mutually leverage dialogue contexts between two SLU tasks.
A gating mechanism is further utilized to filter out irrelevant knowledge triples and to circumvent distracting comprehension.
Experimental results in two complicated multi-turn dialogue datasets have demonstrate by mutually modeling two SLU tasks with filtered knowledge and dialogue contexts.
arXiv Detail & Related papers (2022-02-23T04:03:35Z) - Leveraging Semantic Parsing for Relation Linking over Knowledge Bases [80.99588366232075]
We present SLING, a relation linking framework which leverages semantic parsing using AMR and distant supervision.
SLING integrates multiple relation linking approaches that capture complementary signals such as linguistic cues, rich semantic representation, and information from the knowledgebase.
experiments on relation linking using three KBQA datasets; QALD-7, QALD-9, and LC-QuAD 1.0 demonstrate that the proposed approach achieves state-of-the-art performance on all benchmarks.
arXiv Detail & Related papers (2020-09-16T14:56:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.