SPADE: Systematic Prompt Framework for Automated Dialogue Expansion in Machine-Generated Text Detection
- URL: http://arxiv.org/abs/2503.15044v1
- Date: Wed, 19 Mar 2025 09:32:52 GMT
- Title: SPADE: Systematic Prompt Framework for Automated Dialogue Expansion in Machine-Generated Text Detection
- Authors: Haoyi Li, Angela Yifei Yuan, Soyeon Caren Han, Christopher Leckie,
- Abstract summary: We propose five novel data augmentation frameworks for synthetic user dialogue generation through a structured prompting approach.<n>Our proposed method yields 14 new dialogue datasets, which we benchmark against seven MGT detection models.<n>Considering that real-world agents lack knowledge of future opponent utterances, we simulate online dialogue detection and examine the relationship between chat history length and detection accuracy.
- Score: 15.626772502710867
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The increasing capability of large language models (LLMs) to generate synthetic content has heightened concerns about their misuse, driving the development of Machine-Generated Text (MGT) detection models. However, these detectors face significant challenges due to the lack of systematically generated, high-quality datasets for training. To address this issue, we propose five novel data augmentation frameworks for synthetic user dialogue generation through a structured prompting approach, reducing the costs associated with traditional data collection methods. Our proposed method yields 14 new dialogue datasets, which we benchmark against seven MGT detection models. The results demonstrate improved generalization performance when utilizing a mixed dataset produced by our proposed augmentation framework. Furthermore, considering that real-world agents lack knowledge of future opponent utterances, we simulate online dialogue detection and examine the relationship between chat history length and detection accuracy. We also benchmark online detection performance with limited chat history on our frameworks. Our open-source datasets can be downloaded from https://github.com/AngieYYF/SPADE-customer-service-dialogue.
Related papers
- Speculative End-Turn Detector for Efficient Speech Chatbot Assistant [11.136112399898481]
We introduce the ETD dataset, the first public dataset for end-turn detection.
We also propose SpeculativeETD, a novel collaborative inference framework that balances efficiency and accuracy to improve real-time ETD in resource-constrained environments.
Experiments demonstrate that the proposed SpeculativeETD significantly improves ETD accuracy while keeping the required computations low.
arXiv Detail & Related papers (2025-03-30T13:34:23Z) - Detecting Document-level Paraphrased Machine Generated Content: Mimicking Human Writing Style and Involving Discourse Features [57.34477506004105]
Machine-generated content poses challenges such as academic plagiarism and the spread of misinformation.<n>We introduce novel methodologies and datasets to overcome these challenges.<n>We propose MhBART, an encoder-decoder model designed to emulate human writing style.<n>We also propose DTransformer, a model that integrates discourse analysis through PDTB preprocessing to encode structural features.
arXiv Detail & Related papers (2024-12-17T08:47:41Z) - DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications [18.378069426713]
Existing research is constrained by general or niche datasets that lack sufficient scale for training dialogue systems.
We introduce Dia Synth - a synthetic dialogue generation framework capable of generating high-quality, contextually rich dialogues.
We perform our experiments by generating synthetic data using different LLMs and few-shot examples from DialogSum and SAMSum.
arXiv Detail & Related papers (2024-09-25T07:03:31Z) - Improved Contextual Recognition In Automatic Speech Recognition Systems
By Semantic Lattice Rescoring [4.819085609772069]
We propose a novel approach for enhancing contextual recognition within ASR systems via semantic lattice processing.
Our solution consists of using Hidden Markov Models and Gaussian Mixture Models (HMM-GMM) along with Deep Neural Networks (DNN) models for better accuracy.
We demonstrate the effectiveness of our proposed framework on the LibriSpeech dataset with empirical analyses.
arXiv Detail & Related papers (2023-10-14T23:16:05Z) - PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded
Dialogue Systems [59.1250765143521]
Current knowledge-grounded dialogue systems often fail to align the generated responses with human-preferred qualities.
We propose Polished & Informed Candidate Scoring (PICK), a generation re-scoring framework.
We demonstrate the effectiveness of PICK in generating responses that are more faithful while keeping them relevant to the dialogue history.
arXiv Detail & Related papers (2023-09-19T08:27:09Z) - RECAP: Retrieval-Enhanced Context-Aware Prefix Encoder for Personalized
Dialogue Response Generation [30.245143345565758]
We propose a new retrieval-enhanced approach for personalized response generation.
We design a hierarchical transformer retriever trained on dialogue domain data to perform personalized retrieval and a context-aware prefix encoder that fuses the retrieved information to the decoder more effectively.
We quantitatively evaluate our model's performance under a suite of human and automatic metrics and find it to be superior compared to state-of-the-art baselines on English Reddit conversations.
arXiv Detail & Related papers (2023-06-12T16:10:21Z) - Using Textual Interface to Align External Knowledge for End-to-End
Task-Oriented Dialogue Systems [53.38517204698343]
We propose a novel paradigm that uses a textual interface to align external knowledge and eliminate redundant processes.
We demonstrate our paradigm in practice through MultiWOZ-Remake, including an interactive textual interface built for the MultiWOZ database.
arXiv Detail & Related papers (2023-05-23T05:48:21Z) - Multi-grained Hypergraph Interest Modeling for Conversational
Recommendation [75.65483522949857]
We propose a novel multi-grained hypergraph interest modeling approach to capture user interest beneath intricate historical data.
In our approach, we first employ the hypergraph structure to model users' historical dialogue sessions and form a session-based hypergraph, which captures coarse-grained, session-level relations.
We further conduct multi-grained hypergraph convolution on the two kinds of hypergraphs, and utilize the enhanced representations to develop interest-aware CRS.
arXiv Detail & Related papers (2023-05-04T13:13:44Z) - DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for
Dialog Response Generation [80.45816053153722]
DialogVED introduces continuous latent variables into the enhanced encoder-decoder pre-training framework to increase the relevance and diversity of responses.
We conduct experiments on PersonaChat, DailyDialog, and DSTC7-AVSD benchmarks for response generation.
arXiv Detail & Related papers (2022-04-27T16:18:15Z) - Quick Starting Dialog Systems with Paraphrase Generation [0.0]
We propose a method to reduce the cost and effort of creating new conversational agents by artificially generating more data from existing examples.
Our proposed approach can kick-start a dialog system with little human effort, and brings its performance to a level satisfactory enough for allowing actual interactions with real end-users.
arXiv Detail & Related papers (2022-04-06T02:35:59Z) - Enhancing Dialogue Generation via Multi-Level Contrastive Learning [57.005432249952406]
We propose a multi-level contrastive learning paradigm to model the fine-grained quality of the responses with respect to the query.
A Rank-aware (RC) network is designed to construct the multi-level contrastive optimization objectives.
We build a Knowledge Inference (KI) component to capture the keyword knowledge from the reference during training and exploit such information to encourage the generation of informative words.
arXiv Detail & Related papers (2020-09-19T02:41:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.