Dual Knowledge-Enhanced Two-Stage Reasoner for Multimodal Dialog Systems
- URL: http://arxiv.org/abs/2509.07817v1
- Date: Tue, 09 Sep 2025 14:55:28 GMT
- Title: Dual Knowledge-Enhanced Two-Stage Reasoner for Multimodal Dialog Systems
- Authors: Xiaolin Chen, Xuemeng Song, Haokun Wen, Weili Guan, Xiangyu Zhao, Liqiang Nie,
- Abstract summary: We aim to fully utilize dual knowledge (textiti.e., structured attribute and unstructured review knowledge) with large language models (LLMs) to promote textual response generation.<n>We propose a novel dual knowledge-enhanced two-stage reasoner by adapting LLMs for multimodal dialog systems (named DK2R)
- Score: 81.87703298503374
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Textual response generation is pivotal for multimodal \mbox{task-oriented} dialog systems, which aims to generate proper textual responses based on the multimodal context. While existing efforts have demonstrated remarkable progress, there still exist the following limitations: 1) \textit{neglect of unstructured review knowledge} and 2) \textit{underutilization of large language models (LLMs)}. Inspired by this, we aim to fully utilize dual knowledge (\textit{i.e., } structured attribute and unstructured review knowledge) with LLMs to promote textual response generation in multimodal task-oriented dialog systems. However, this task is non-trivial due to two key challenges: 1) \textit{dynamic knowledge type selection} and 2) \textit{intention-response decoupling}. To address these challenges, we propose a novel dual knowledge-enhanced two-stage reasoner by adapting LLMs for multimodal dialog systems (named DK2R). To be specific, DK2R first extracts both structured attribute and unstructured review knowledge from external knowledge base given the dialog context. Thereafter, DK2R uses an LLM to evaluate each knowledge type's utility by analyzing LLM-generated provisional probe responses. Moreover, DK2R separately summarizes the intention-oriented key clues via dedicated reasoning, which are further used as auxiliary signals to enhance LLM-based textual response generation. Extensive experiments conducted on a public dataset verify the superiority of DK2R. We have released the codes and parameters.
Related papers
- Boost, Disentangle, and Customize: A Robust System2-to-System1 Pipeline for Code Generation [58.799397354312596]
Large language models (LLMs) have demonstrated remarkable capabilities in various domains, particularly in system 1 tasks.<n>Recent research on System2-to-System1 methods surge, exploring the System 2 reasoning knowledge via inference-time computation.<n>In this paper, we focus on code generation, which is a representative System 2 task, and identify two primary challenges.
arXiv Detail & Related papers (2025-02-18T03:20:50Z) - Distilling Implicit Multimodal Knowledge into Large Language Models for Zero-Resource Dialogue Generation [22.606764428110566]
We propose the Visual Implicit Knowledge Distillation Framework (VIKDF) for enriched dialogue generation in zero-resource contexts.<n>VIKDF comprises two main stages: knowledge distillation and knowledge integration.<n>Our experiments show that VIKDF outperforms existing state-of-the-art models in generating high-quality dialogues.
arXiv Detail & Related papers (2024-05-16T14:21:33Z) - Generative Multi-Modal Knowledge Retrieval with Large Language Models [75.70313858231833]
We propose an innovative end-to-end generative framework for multi-modal knowledge retrieval.
Our framework takes advantage of the fact that large language models (LLMs) can effectively serve as virtual knowledge bases.
We demonstrate significant improvements ranging from 3.0% to 14.6% across all evaluation metrics when compared to strong baselines.
arXiv Detail & Related papers (2024-01-16T08:44:29Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Dual Semantic Knowledge Composed Multimodal Dialog Systems [114.52730430047589]
We propose a novel multimodal task-oriented dialog system named MDS-S2.
It acquires the context related attribute and relation knowledge from the knowledge base.
We also devise a set of latent query variables to distill the semantic information from the composed response representation.
arXiv Detail & Related papers (2023-05-17T06:33:26Z) - Multimodal Dialog Systems with Dual Knowledge-enhanced Generative Pretrained Language Model [63.461030694700014]
We propose a novel dual knowledge-enhanced generative pretrained language model for multimodal task-oriented dialog systems (DKMD)
The proposed DKMD consists of three key components: dual knowledge selection, dual knowledge-enhanced context learning, and knowledge-enhanced response generation.
Experiments on a public dataset verify the superiority of the proposed DKMD over state-of-the-art competitors.
arXiv Detail & Related papers (2022-07-16T13:02:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.