Related papers: Dialogue Benchmark Generation from Knowledge Graphs with Cost-Effective Retrieval-Augmented LLMs

Dialogue Benchmark Generation from Knowledge Graphs with Cost-Effective Retrieval-Augmented LLMs

URL: http://arxiv.org/abs/2501.09928v1
Date: Fri, 17 Jan 2025 02:48:29 GMT
Title: Dialogue Benchmark Generation from Knowledge Graphs with Cost-Effective Retrieval-Augmented LLMs
Authors: Reham Omar, Omij Mangukiya, Essam Mansour,
Abstract summary: Chatty-Gen is a novel multi-stage retrieval-augmented generation platform for dialogue benchmarks.<n>Chatty-Gen decomposes the generation process into manageable stages and uses assertion rules for automatic validation.<n>Our experiments with several real and large KGs demonstrate that Chatty-Gen significantly outperforms state-of-the-art systems.
Score: 0.8772713588571283
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Dialogue benchmarks are crucial in training and evaluating chatbots engaging in domain-specific conversations. Knowledge graphs (KGs) represent semantically rich and well-organized data spanning various domains, such as DBLP, DBpedia, and YAGO. Traditionally, dialogue benchmarks have been manually created from documents, neglecting the potential of KGs in automating this process. Some question-answering benchmarks are automatically generated using extensive preprocessing from KGs, but they do not support dialogue generation. This paper introduces Chatty-Gen, a novel multi-stage retrieval-augmented generation platform for automatically generating high-quality dialogue benchmarks tailored to a specific domain using a KG. Chatty-Gen decomposes the generation process into manageable stages and uses assertion rules for automatic validation between stages. Our approach enables control over intermediate results to prevent time-consuming restarts due to hallucinations. It also reduces reliance on costly and more powerful commercial LLMs. Chatty-Gen eliminates upfront processing of the entire KG using efficient query-based retrieval to find representative subgraphs based on the dialogue context. Our experiments with several real and large KGs demonstrate that Chatty-Gen significantly outperforms state-of-the-art systems and ensures consistent model and system performance across multiple LLMs of diverse capabilities, such as GPT-4o, Gemini 1.5, Llama 3, and Mistral.

Related papers

ProKG-Dial: Progressive Multi-Turn Dialogue Construction with Domain Knowledge Graphs [3.9190413787169414]
Current large language models (LLMs) excel at general NLP tasks but often lack domain specific precision in professional settings.<n>We introduce ProKG Dial, a framework for constructing knowledge intensive multi turn dialogue using domain specific knowledge graphs (KGs)<n>We validate ProKG Dial on a medical knowledge graph by evaluating the generated dialogues in terms of diversity, semantic coherence, and entity coverage.
arXiv Detail & Related papers (2025-08-03T17:52:42Z)
Commonsense Generation and Evaluation for Dialogue Systems using Large Language Models [8.556799193001341]
This paper explores the task of performing turn-level data augmentation for dialogue system based on different types of commonsense relationships.<n>The proposed methodology takes advantage of the extended knowledge and zero-shot capabilities of pretrained Large Language Models (LLMs) to follow instructions.<n>Preliminary results suggest that our approach effectively harnesses LLMs capabilities for commonsense reasoning and evaluation in dialogue systems.
arXiv Detail & Related papers (2025-06-24T10:18:05Z)
Bottom-Up Synthesis of Knowledge-Grounded Task-Oriented Dialogues with Iteratively Self-Refined Prompts [19.73376945990922]
We introduce a bottom-up conversation synthesis approach, where QA pairs are generated first and then combined into a coherent dialogue. This structure allows the use of non-local models in stages that do not involve proprietary knowledge. Both human and automated evaluations demonstrate that our approach produces more realistic and higher-quality dialogues.
arXiv Detail & Related papers (2025-04-19T18:25:53Z)
SPADE: Systematic Prompt Framework for Automated Dialogue Expansion in Machine-Generated Text Detection [15.626772502710867]
We propose five novel data augmentation frameworks for synthetic user dialogue generation through a structured prompting approach. Our proposed method yields 14 new dialogue datasets, which we benchmark against seven MGT detection models. Considering that real-world agents lack knowledge of future opponent utterances, we simulate online dialogue detection and examine the relationship between chat history length and detection accuracy.
arXiv Detail & Related papers (2025-03-19T09:32:52Z)
Mitigating the Negative Impact of Over-association for Conversational Query Production [44.661864532728615]
Conversational query generation aims at producing search queries from dialogue histories, which are then used to retrieve relevant knowledge from a search engine. Previous models suffer from the data hunger issue, and they tend to both drop important concepts from dialogue histories and generate irrelevant concepts at inference time. We propose effective instance-level weighting strategies for training to mitigate these issues from multiple perspectives.
arXiv Detail & Related papers (2024-09-29T06:19:59Z)
Evaluating Very Long-Term Conversational Memory of LLM Agents [95.84027826745609]
We introduce a machine-human pipeline to generate high-quality, very long-term dialogues. We equip each agent with the capability of sharing and reacting to images. The generated conversations are verified and edited by human annotators for long-range consistency.
arXiv Detail & Related papers (2024-02-27T18:42:31Z)
DialCLIP: Empowering CLIP as Multi-Modal Dialog Retriever [83.33209603041013]
We propose a parameter-efficient prompt-tuning method named DialCLIP for multi-modal dialog retrieval. Our approach introduces a multi-modal context generator to learn context features which are distilled into prompts within the pre-trained vision-language model CLIP. To facilitate various types of retrieval, we also design multiple experts to learn mappings from CLIP outputs to multi-modal representation space.
arXiv Detail & Related papers (2024-01-02T07:40:12Z)
Retrieval-Generation Alignment for End-to-End Task-Oriented Dialogue System [40.33178881317882]
We propose the application of maximal marginal likelihood to train a perceptive retriever by utilizing signals from response generation for supervision. We evaluate our approach on three task-oriented dialogue datasets using T5 and ChatGPT as the backbone models.
arXiv Detail & Related papers (2023-10-13T06:03:47Z)
Evaluating Large Language Models for Document-grounded Response Generation in Information-Seeking Dialogues [17.41334279810008]
We investigate the use of large language models (LLMs) like ChatGPT for document-grounded response generation in the context of information-seeking dialogues. For evaluation, we use the MultiDoc2Dial corpus of task-oriented dialogues in four social service domains. While both ChatGPT variants are more likely to include information not present in the relevant segments, possibly including a presence of hallucinations, they are rated higher than both the shared task winning system and human responses.
arXiv Detail & Related papers (2023-09-21T07:28:03Z)
Multi-grained Hypergraph Interest Modeling for Conversational Recommendation [75.65483522949857]
We propose a novel multi-grained hypergraph interest modeling approach to capture user interest beneath intricate historical data. In our approach, we first employ the hypergraph structure to model users' historical dialogue sessions and form a session-based hypergraph, which captures coarse-grained, session-level relations. We further conduct multi-grained hypergraph convolution on the two kinds of hypergraphs, and utilize the enhanced representations to develop interest-aware CRS.
arXiv Detail & Related papers (2023-05-04T13:13:44Z)
GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog. We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups. A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z)
A Template-guided Hybrid Pointer Network for Knowledge-basedTask-oriented Dialogue Systems [15.654119998970499]
We propose a template-guided hybrid pointer network for the knowledge-based task-oriented dialogue system. We design a memory pointer network model with a gating mechanism to fully exploit the semantic correlation between the retrieved answers and the ground-truth response.
arXiv Detail & Related papers (2021-06-10T15:49:26Z)
Learning an Unreferenced Metric for Online Dialogue Evaluation [53.38078951628143]
We propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances. We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.
arXiv Detail & Related papers (2020-05-01T20:01:39Z)
Few-shot Natural Language Generation for Task-Oriented Dialog [113.07438787659859]
We present FewShotWoz, the first NLG benchmark to simulate the few-shot learning setting in task-oriented dialog systems. We develop the SC-GPT model, which is pre-trained on a large set of annotated NLG corpus to acquire the controllable generation ability. Experiments on FewShotWoz and the large Multi-Domain-WOZ datasets show that the proposed SC-GPT significantly outperforms existing methods.
arXiv Detail & Related papers (2020-02-27T18:48:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.