Related papers: ProKG-Dial: Progressive Multi-Turn Dialogue Construction with Domain Knowledge Graphs

ProKG-Dial: Progressive Multi-Turn Dialogue Construction with Domain Knowledge Graphs

URL: http://arxiv.org/abs/2508.01869v1
Date: Sun, 03 Aug 2025 17:52:42 GMT
Title: ProKG-Dial: Progressive Multi-Turn Dialogue Construction with Domain Knowledge Graphs
Authors: Yuanyuan Liang, Xiaoman Wang, Tingyu Xie, Lei Pan,
Abstract summary: Current large language models (LLMs) excel at general NLP tasks but often lack domain specific precision in professional settings.<n>We introduce ProKG Dial, a framework for constructing knowledge intensive multi turn dialogue using domain specific knowledge graphs (KGs)<n>We validate ProKG Dial on a medical knowledge graph by evaluating the generated dialogues in terms of diversity, semantic coherence, and entity coverage.
Score: 3.9190413787169414
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Current large language models (LLMs) excel at general NLP tasks but often lack domain specific precision in professional settings. Building a high quality domain specific multi turn dialogue dataset is essential for developing specialized conversational systems. However, existing methods such as manual annotation, simulated human LLM interactions, and role based LLM dialogues are resource intensive or suffer from limitations in dialogue quality and domain coverage. To address these challenges, we introduce ProKG Dial, a progressive framework for constructing knowledge intensive multi turn dialogue datasets using domain specific knowledge graphs (KGs). ProKG Dial leverages the structured nature of KGs to encode complex domain knowledge and relationships, providing a solid foundation for generating meaningful and coherent dialogues. Specifically, ProKG Dial begins by applying community detection to partition the KG into semantically cohesive subgraphs. For each subgraph, the framework incrementally generates a series of questions and answers centered around a target entity, ensuring relevance and coverage. A rigorous filtering step is employed to maintain high dialogue quality. We validate ProKG Dial on a medical knowledge graph by evaluating the generated dialogues in terms of diversity, semantic coherence, and entity coverage. Furthermore, we fine tune a base LLM on the resulting dataset and benchmark it against several baselines. Both automatic metrics and human evaluations demonstrate that ProKG Dial substantially improves dialogue quality and domain specific performance, highlighting its effectiveness and practical utility.

Related papers

CoDial: Interpretable Task-Oriented Dialogue Systems Through Dialogue Flow Alignment [24.936670177298584]
We introduce a novel framework, CoDial, that converts expert knowledge into executable conversation logic.<n>CoDial can be easily implemented in existing guardrailing languages, such as Colang.<n>It achieves state-of-the-art performance on the STAR dataset for inference-based models and is competitive with similar baselines on the well-known MultiWOZ dataset.
arXiv Detail & Related papers (2025-06-02T21:12:27Z)
Bottom-Up Synthesis of Knowledge-Grounded Task-Oriented Dialogues with Iteratively Self-Refined Prompts [19.73376945990922]
We introduce a bottom-up conversation synthesis approach, where QA pairs are generated first and then combined into a coherent dialogue.<n>This structure allows the use of non-local models in stages that do not involve proprietary knowledge.<n>Both human and automated evaluations demonstrate that our approach produces more realistic and higher-quality dialogues.
arXiv Detail & Related papers (2025-04-19T18:25:53Z)
Dialogue Benchmark Generation from Knowledge Graphs with Cost-Effective Retrieval-Augmented LLMs [0.8772713588571283]
Chatty-Gen is a novel multi-stage retrieval-augmented generation platform for dialogue benchmarks.<n>Chatty-Gen decomposes the generation process into manageable stages and uses assertion rules for automatic validation.<n>Our experiments with several real and large KGs demonstrate that Chatty-Gen significantly outperforms state-of-the-art systems.
arXiv Detail & Related papers (2025-01-17T02:48:29Z)
Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization [7.522493227357079]
Large Language Models (LLMs) are pre-trained on large-scale corpora. LLMs suffer from hallucinations, knowledge cut-offs, and lack of knowledge attributions. We introduce SMART-SLIC, a highly domain-specific LLM framework.
arXiv Detail & Related papers (2024-10-03T17:40:55Z)
DialCLIP: Empowering CLIP as Multi-Modal Dialog Retriever [83.33209603041013]
We propose a parameter-efficient prompt-tuning method named DialCLIP for multi-modal dialog retrieval. Our approach introduces a multi-modal context generator to learn context features which are distilled into prompts within the pre-trained vision-language model CLIP. To facilitate various types of retrieval, we also design multiple experts to learn mappings from CLIP outputs to multi-modal representation space.
arXiv Detail & Related papers (2024-01-02T07:40:12Z)
Fine-Grained Analysis of Team Collaborative Dialogue [1.363890704621148]
We describe initial work towards developing an explainable analytics tool in the software development domain using Slack chats. We create a novel, hierarchical labeling scheme; design of descriptive metrics based on the frequency of occurrence of dialogue acts; and initial results using a transformer + CRF architecture to incorporate long-range context.
arXiv Detail & Related papers (2023-12-09T05:38:32Z)
SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation [55.82577086422923]
We provide a feasible definition of dialogue segmentation points with the help of document-grounded dialogues. We release a large-scale supervised dataset called SuperDialseg, containing 9,478 dialogues. We also provide a benchmark including 18 models across five categories for the dialogue segmentation task.
arXiv Detail & Related papers (2023-05-15T06:08:01Z)
Variational Reasoning over Incomplete Knowledge Graphs for Conversational Recommendation [48.70062671767362]
We propose the Variational Reasoning over Incomplete KGs Conversational Recommender (VRICR) Our key idea is to incorporate the large dialogue corpus naturally accompanied with CRSs to enhance the incomplete KGs. We also denote the dialogue-specific subgraphs of KGs as latent variables with categorical priors for adaptive knowledge graphs.
arXiv Detail & Related papers (2022-12-22T17:02:21Z)
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment [58.46761798403072]
A model-based automatic dialogue evaluation metric (ADEM) is expected to perform well across multiple domains. Despite significant progress, an ADEM that works well in one domain does not necessarily generalize to another. We propose a Panel of Experts (PoE) network that consists of a shared transformer encoder and a collection of lightweight adapters.
arXiv Detail & Related papers (2022-12-18T02:26:50Z)
Detecting and Classifying Malevolent Dialogue Responses: Taxonomy, Data and Methodology [68.8836704199096]
Corpus-based conversational interfaces are able to generate more diverse and natural responses than template-based or retrieval-based agents. With their increased generative capacity of corpusbased conversational agents comes the need to classify and filter out malevolent responses. Previous studies on the topic of recognizing and classifying inappropriate content are mostly focused on a certain category of malevolence.
arXiv Detail & Related papers (2020-08-21T22:43:27Z)
Improving Conversational Recommender Systems via Knowledge Graph based Semantic Fusion [77.21442487537139]
Conversational recommender systems (CRS) aim to recommend high-quality items to users through interactive conversations. First, the conversation data itself lacks of sufficient contextual information for accurately understanding users' preference. Second, there is a semantic gap between natural language expression and item-level user preference.
arXiv Detail & Related papers (2020-07-08T11:14:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.