A Survey on Recent Advances in Conversational Data Generation
- URL: http://arxiv.org/abs/2405.13003v1
- Date: Sun, 12 May 2024 10:11:12 GMT
- Title: A Survey on Recent Advances in Conversational Data Generation
- Authors: Heydar Soudani, Roxana Petcu, Evangelos Kanoulas, Faegheh Hasibi,
- Abstract summary: We offer a systematic and comprehensive review of multi-turn conversational data generation.
We focus on three types of dialogue systems: open domain, task-oriented, and information-seeking.
We examine the evaluation metrics and methods for assessing synthetic conversational data.
- Score: 14.237954885530396
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in conversational systems have significantly enhanced human-machine interactions across various domains. However, training these systems is challenging due to the scarcity of specialized dialogue data. Traditionally, conversational datasets were created through crowdsourcing, but this method has proven costly, limited in scale, and labor-intensive. As a solution, the development of synthetic dialogue data has emerged, utilizing techniques to augment existing datasets or convert textual resources into conversational formats, providing a more efficient and scalable approach to dataset creation. In this survey, we offer a systematic and comprehensive review of multi-turn conversational data generation, focusing on three types of dialogue systems: open domain, task-oriented, and information-seeking. We categorize the existing research based on key components like seed data creation, utterance generation, and quality filtering methods, and introduce a general framework that outlines the main principles of conversation data generation systems. Additionally, we examine the evaluation metrics and methods for assessing synthetic conversational data, address current challenges in the field, and explore potential directions for future research. Our goal is to accelerate progress for researchers and practitioners by presenting an overview of state-of-the-art methods and highlighting opportunities to further research in this area.
Related papers
- ProCIS: A Benchmark for Proactive Retrieval in Conversations [21.23826888841565]
We introduce a large-scale dataset for proactive document retrieval that consists of over 2.8 million conversations.
We conduct crowdsourcing experiments to obtain high-quality and relatively complete relevance judgments.
We also collect annotations related to the parts of the conversation that are related to each document, enabling us to evaluate proactive retrieval systems.
arXiv Detail & Related papers (2024-05-10T13:11:07Z) - A Systematic Review of Data-to-Text NLG [2.4769539696439677]
Methods for producing high-quality text are explored, addressing the challenge of hallucinations in data-to-text generation.
Despite advancements in text quality, the review emphasizes the importance of research in low-resourced languages.
arXiv Detail & Related papers (2024-02-13T14:51:45Z) - Data Augmentation for Conversational AI [17.48107304359591]
Data augmentation (DA) is an affective approach to alleviate the data scarcity problem in conversational systems.
This tutorial provides a comprehensive and up-to-date overview of DA approaches in the context of conversational systems.
arXiv Detail & Related papers (2023-09-09T09:56:35Z) - AUGUST: an Automatic Generation Understudy for Synthesizing
Conversational Recommendation Datasets [56.052803235932686]
We propose a novel automatic dataset synthesis approach that can generate both large-scale and high-quality recommendation dialogues.
In doing so, we exploit: (i) rich personalized user profiles from traditional recommendation datasets, (ii) rich external knowledge from knowledge graphs, and (iii) the conversation ability contained in human-to-human conversational recommendation datasets.
arXiv Detail & Related papers (2023-06-16T05:27:14Z) - FCC: Fusing Conversation History and Candidate Provenance for Contextual
Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels.
We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z) - Taxonomy of Abstractive Dialogue Summarization: Scenarios, Approaches
and Future Directions [14.85592662663867]
This survey provides a comprehensive investigation on existing work for abstractive dialogue summarization from scenarios.
It categorizes the task into two broad categories according to the type of input dialogues, i.e., open-domain and task-oriented.
It presents a taxonomy of existing techniques in three directions, namely, injecting dialogue features, designing auxiliary training tasks and using additional data.
arXiv Detail & Related papers (2022-10-18T14:33:03Z) - Dialogue Term Extraction using Transfer Learning and Topological Data
Analysis [0.8185867455104834]
We explore different features that can enable systems to discover realizations of domains, slots, and values in dialogues in a purely data-driven fashion.
To examine the utility of each feature set, we train a seed model based on the widely used MultiWOZ data-set.
Our method outperforms the previously proposed approach that relies solely on word embeddings.
arXiv Detail & Related papers (2022-08-22T17:04:04Z) - HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on
Tabular and Textual Data [87.67278915655712]
We present a new dialogue dataset, HybriDialogue, which consists of crowdsourced natural conversations grounded on both Wikipedia text and tables.
The conversations are created through the decomposition of complex multihop questions into simple, realistic multiturn dialogue interactions.
arXiv Detail & Related papers (2022-04-28T00:52:16Z) - Automatic Evaluation and Moderation of Open-domain Dialogue Systems [59.305712262126264]
A long standing challenge that bothers the researchers is the lack of effective automatic evaluation metrics.
This paper describes the data, baselines and results obtained for the Track 5 at the Dialogue System Technology Challenge 10 (DSTC10)
arXiv Detail & Related papers (2021-11-03T10:08:05Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - Retrieval-Free Knowledge-Grounded Dialogue Response Generation with
Adapters [52.725200145600624]
We propose KnowExpert to bypass the retrieval process by injecting prior knowledge into the pre-trained language models with lightweight adapters.
Experimental results show that KnowExpert performs comparably with the retrieval-based baselines.
arXiv Detail & Related papers (2021-05-13T12:33:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.