Related papers: A Survey on Recent Advances in Conversational Data Generation

A Survey on Recent Advances in Conversational Data Generation

URL: http://arxiv.org/abs/2405.13003v1
Date: Sun, 12 May 2024 10:11:12 GMT
Title: A Survey on Recent Advances in Conversational Data Generation
Authors: Heydar Soudani, Roxana Petcu, Evangelos Kanoulas, Faegheh Hasibi,
Abstract summary: We offer a systematic and comprehensive review of multi-turn conversational data generation. We focus on three types of dialogue systems: open domain, task-oriented, and information-seeking. We examine the evaluation metrics and methods for assessing synthetic conversational data.
Score: 14.237954885530396
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in conversational systems have significantly enhanced human-machine interactions across various domains. However, training these systems is challenging due to the scarcity of specialized dialogue data. Traditionally, conversational datasets were created through crowdsourcing, but this method has proven costly, limited in scale, and labor-intensive. As a solution, the development of synthetic dialogue data has emerged, utilizing techniques to augment existing datasets or convert textual resources into conversational formats, providing a more efficient and scalable approach to dataset creation. In this survey, we offer a systematic and comprehensive review of multi-turn conversational data generation, focusing on three types of dialogue systems: open domain, task-oriented, and information-seeking. We categorize the existing research based on key components like seed data creation, utterance generation, and quality filtering methods, and introduce a general framework that outlines the main principles of conversation data generation systems. Additionally, we examine the evaluation metrics and methods for assessing synthetic conversational data, address current challenges in the field, and explore potential directions for future research. Our goal is to accelerate progress for researchers and practitioners by presenting an overview of state-of-the-art methods and highlighting opportunities to further research in this area.

Related papers

Summarizing Speech: A Comprehensive Survey [76.13011304983458]
Speech summarization has become an essential tool for efficiently managing and accessing the growing volume of spoken and audiovisual content.<n>This survey examines existing datasets and evaluation protocols, which are crucial for assessing the quality of summarization approaches.
arXiv Detail & Related papers (2025-04-10T17:50:53Z)
ProCIS: A Benchmark for Proactive Retrieval in Conversations [21.23826888841565]
We introduce a large-scale dataset for proactive document retrieval that consists of over 2.8 million conversations. We conduct crowdsourcing experiments to obtain high-quality and relatively complete relevance judgments. We also collect annotations related to the parts of the conversation that are related to each document, enabling us to evaluate proactive retrieval systems.
arXiv Detail & Related papers (2024-05-10T13:11:07Z)
A Systematic Review of Data-to-Text NLG [2.4769539696439677]
Methods for producing high-quality text are explored, addressing the challenge of hallucinations in data-to-text generation. Despite advancements in text quality, the review emphasizes the importance of research in low-resourced languages.
arXiv Detail & Related papers (2024-02-13T14:51:45Z)
Data Augmentation for Conversational AI [17.48107304359591]
Data augmentation (DA) is an affective approach to alleviate the data scarcity problem in conversational systems. This tutorial provides a comprehensive and up-to-date overview of DA approaches in the context of conversational systems.
arXiv Detail & Related papers (2023-09-09T09:56:35Z)
AUGUST: an Automatic Generation Understudy for Synthesizing Conversational Recommendation Datasets [56.052803235932686]
We propose a novel automatic dataset synthesis approach that can generate both large-scale and high-quality recommendation dialogues. In doing so, we exploit: (i) rich personalized user profiles from traditional recommendation datasets, (ii) rich external knowledge from knowledge graphs, and (iii) the conversation ability contained in human-to-human conversational recommendation datasets.
arXiv Detail & Related papers (2023-06-16T05:27:14Z)
FCC: Fusing Conversation History and Candidate Provenance for Contextual Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels. We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z)
Taxonomy of Abstractive Dialogue Summarization: Scenarios, Approaches and Future Directions [14.85592662663867]
This survey provides a comprehensive investigation on existing work for abstractive dialogue summarization from scenarios. It categorizes the task into two broad categories according to the type of input dialogues, i.e., open-domain and task-oriented. It presents a taxonomy of existing techniques in three directions, namely, injecting dialogue features, designing auxiliary training tasks and using additional data.
arXiv Detail & Related papers (2022-10-18T14:33:03Z)
Dialogue Term Extraction using Transfer Learning and Topological Data Analysis [0.8185867455104834]
We explore different features that can enable systems to discover realizations of domains, slots, and values in dialogues in a purely data-driven fashion. To examine the utility of each feature set, we train a seed model based on the widely used MultiWOZ data-set. Our method outperforms the previously proposed approach that relies solely on word embeddings.
arXiv Detail & Related papers (2022-08-22T17:04:04Z)
HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data [87.67278915655712]
We present a new dialogue dataset, HybriDialogue, which consists of crowdsourced natural conversations grounded on both Wikipedia text and tables. The conversations are created through the decomposition of complex multihop questions into simple, realistic multiturn dialogue interactions.
arXiv Detail & Related papers (2022-04-28T00:52:16Z)
Automatic Evaluation and Moderation of Open-domain Dialogue Systems [59.305712262126264]
A long standing challenge that bothers the researchers is the lack of effective automatic evaluation metrics. This paper describes the data, baselines and results obtained for the Track 5 at the Dialogue System Technology Challenge 10 (DSTC10)
arXiv Detail & Related papers (2021-11-03T10:08:05Z)
ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads. We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z)
Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters [52.725200145600624]
We propose KnowExpert to bypass the retrieval process by injecting prior knowledge into the pre-trained language models with lightweight adapters. Experimental results show that KnowExpert performs comparably with the retrieval-based baselines.
arXiv Detail & Related papers (2021-05-13T12:33:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.