Commonsense Generation and Evaluation for Dialogue Systems using Large Language Models
- URL: http://arxiv.org/abs/2506.19483v1
- Date: Tue, 24 Jun 2025 10:18:05 GMT
- Title: Commonsense Generation and Evaluation for Dialogue Systems using Large Language Models
- Authors: Marcos Estecha-Garitagoitia, Chen Zhang, Mario RodrÃguez-Cantelar, Luis Fernando D'Haro,
- Abstract summary: This paper explores the task of performing turn-level data augmentation for dialogue system based on different types of commonsense relationships.<n>The proposed methodology takes advantage of the extended knowledge and zero-shot capabilities of pretrained Large Language Models (LLMs) to follow instructions.<n>Preliminary results suggest that our approach effectively harnesses LLMs capabilities for commonsense reasoning and evaluation in dialogue systems.
- Score: 8.556799193001341
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper provides preliminary results on exploring the task of performing turn-level data augmentation for dialogue system based on different types of commonsense relationships, and the automatic evaluation of the generated synthetic turns. The proposed methodology takes advantage of the extended knowledge and zero-shot capabilities of pretrained Large Language Models (LLMs) to follow instructions, understand contextual information, and their commonsense reasoning capabilities. The approach draws inspiration from methodologies like Chain-of-Thought (CoT), applied more explicitly to the task of prompt-based generation for dialogue-based data augmentation conditioned on commonsense attributes, and the automatic evaluation of the generated dialogues. To assess the effectiveness of the proposed approach, first we extracted 200 randomly selected partial dialogues, from 5 different well-known dialogue datasets, and generate alternative responses conditioned on different event commonsense attributes. This novel dataset allows us to measure the proficiency of LLMs in generating contextually relevant commonsense knowledge, particularly up to 12 different specific ATOMIC [10] database relations. Secondly, we propose an evaluation framework to automatically detect the quality of the generated dataset inspired by the ACCENT [26] metric, which offers a nuanced approach to assess event commonsense. However, our method does not follow ACCENT's complex eventrelation tuple extraction process. Instead, we propose an instruction-based prompt for each commonsense attribute and use state-of-the-art LLMs to automatically detect the original attributes used when creating each augmented turn in the previous step. Preliminary results suggest that our approach effectively harnesses LLMs capabilities for commonsense reasoning and evaluation in dialogue systems.
Related papers
- Bottom-Up Synthesis of Knowledge-Grounded Task-Oriented Dialogues with Iteratively Self-Refined Prompts [19.73376945990922]
We introduce a bottom-up conversation synthesis approach, where QA pairs are generated first and then combined into a coherent dialogue.<n>This structure allows the use of non-local models in stages that do not involve proprietary knowledge.<n>Both human and automated evaluations demonstrate that our approach produces more realistic and higher-quality dialogues.
arXiv Detail & Related papers (2025-04-19T18:25:53Z) - Evaluating LLM-based Agents for Multi-Turn Conversations: A Survey [64.08485471150486]
This survey examines evaluation methods for large language model (LLM)-based agents in multi-turn conversational settings.<n>We systematically reviewed nearly 250 scholarly sources, capturing the state of the art from various venues of publication.
arXiv Detail & Related papers (2025-03-28T14:08:40Z) - Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - Simulating Task-Oriented Dialogues with State Transition Graphs and Large Language Models [16.94819621353007]
SynTOD is a new synthetic data generation approach for developing end-to-end Task-Oriented Dialogue (TOD) systems.
It generates diverse, structured conversations through random walks and response simulation using large language models.
In our experiments, using graph-guided response simulations leads to significant improvements in intent classification, slot filling and response relevance.
arXiv Detail & Related papers (2024-04-23T06:23:34Z) - PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded
Dialogue Systems [59.1250765143521]
Current knowledge-grounded dialogue systems often fail to align the generated responses with human-preferred qualities.
We propose Polished & Informed Candidate Scoring (PICK), a generation re-scoring framework.
We demonstrate the effectiveness of PICK in generating responses that are more faithful while keeping them relevant to the dialogue history.
arXiv Detail & Related papers (2023-09-19T08:27:09Z) - FCC: Fusing Conversation History and Candidate Provenance for Contextual
Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels.
We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z) - Achieving Conversational Goals with Unsupervised Post-hoc Knowledge
Injection [37.15893335147598]
A limitation of current neural dialog models is that they tend to suffer from a lack of specificity and informativeness in generated responses.
We propose a post-hoc knowledge-injection technique where we first retrieve a diverse set of relevant knowledge snippets conditioned on both the dialog history and an initial response from an existing dialog model.
We construct multiple candidate responses, individually injecting each retrieved snippet into the initial response using a gradient-based decoding method, and then select the final response with an unsupervised ranking step.
arXiv Detail & Related papers (2022-03-22T00:42:27Z) - Commonsense-Focused Dialogues for Response Generation: An Empirical
Study [39.49727190159279]
We present an empirical study of commonsense in dialogue response generation.
We first auto-extract commonsensical dialogues from existing dialogue datasets by leveraging ConceptNet.
We then collect a new dialogue dataset with 25K dialogues aimed at exhibiting social commonsense in an interactive setting.
arXiv Detail & Related papers (2021-09-14T04:32:09Z) - Retrieval-Free Knowledge-Grounded Dialogue Response Generation with
Adapters [52.725200145600624]
We propose KnowExpert to bypass the retrieval process by injecting prior knowledge into the pre-trained language models with lightweight adapters.
Experimental results show that KnowExpert performs comparably with the retrieval-based baselines.
arXiv Detail & Related papers (2021-05-13T12:33:23Z) - Evaluating Groundedness in Dialogue Systems: The BEGIN Benchmark [29.722504033424382]
Knowledge-grounded dialogue agents are systems designed to conduct a conversation based on externally provided background information, such as a Wikipedia page.
We introduce the Benchmark for Evaluation of Grounded INteraction (BEGIN)
BEGIN consists of 8113 dialogue turns generated by language-model-based dialogue systems, accompanied by humans annotations specifying the relationship between the system's response and the background information.
arXiv Detail & Related papers (2021-04-30T20:17:52Z) - Dialogue Distillation: Open-Domain Dialogue Augmentation Using Unpaired
Data [61.71319905364992]
We propose a novel data augmentation method for training open-domain dialogue models by utilizing unpaired data.
A data-level distillation process is first proposed to construct augmented dialogues where both post and response are retrieved from the unpaired data.
A ranking module is employed to filter out low-quality dialogues.
A model-level distillation process is employed to distill a teacher model trained on high-quality paired data to augmented dialogue pairs.
arXiv Detail & Related papers (2020-09-20T13:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.