DialogueForge: LLM Simulation of Human-Chatbot Dialogue
- URL: http://arxiv.org/abs/2507.15752v1
- Date: Mon, 21 Jul 2025 16:08:19 GMT
- Title: DialogueForge: LLM Simulation of Human-Chatbot Dialogue
- Authors: Ruizhe Zhu, Hao Zhu, Yaxuan Li, Syang Zhou, Shijing Cai, Malgorzata Lazuka, Elliott Ash,
- Abstract summary: We propose DialogueForge as a framework for generating AI-simulated conversations in human-chatbot style.<n>To each generated conversation, DialogueForge uses seed prompts extracted from real human-chatbot interactions.<n>We evaluate the quality of the simulated conversations and compare different models using the UniEval and GTEval evaluation protocols.
- Score: 7.038493120049631
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Collecting human-chatbot dialogues typically demands substantial manual effort and is time-consuming, which limits and poses challenges for research on conversational AI. In this work, we propose DialogueForge - a framework for generating AI-simulated conversations in human-chatbot style. To initialize each generated conversation, DialogueForge uses seed prompts extracted from real human-chatbot interactions. We test a variety of LLMs to simulate the human chatbot user, ranging from state-of-the-art proprietary models to small-scale open-source LLMs, and generate multi-turn dialogues tailored to specific tasks. In addition, we explore fine-tuning techniques to enhance the ability of smaller models to produce indistinguishable human-like dialogues. We evaluate the quality of the simulated conversations and compare different models using the UniEval and GTEval evaluation protocols. Our experiments show that large proprietary models (e.g., GPT-4o) generally outperform others in generating more realistic dialogues, while smaller open-source models (e.g., Llama, Mistral) offer promising performance with greater customization. We demonstrate that the performance of smaller models can be significantly improved by employing supervised fine-tuning techniques. Nevertheless, maintaining coherent and natural long-form human-like dialogues remains a common challenge across all models.
Related papers
- REALTALK: A 21-Day Real-World Dataset for Long-Term Conversation [51.97224538045096]
We introduce REALTALK, a 21-day corpus of authentic messaging app dialogues.<n>We compare EI attributes and persona consistency to understand the challenges posed by real-world dialogues.<n>Our findings reveal that models struggle to simulate a user solely from dialogue history, while fine-tuning on specific user chats improves persona emulation.
arXiv Detail & Related papers (2025-02-18T20:29:01Z) - OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation [53.7173034249361]
End-to-end GPT-based model OmniFlatten capable of effectively modeling complex behaviors inherent natural conversations with low latency.<n>Our approach offers a simple modeling technique and a promising research direction for developing efficient and natural end-to-end full- spoken dialogue systems.
arXiv Detail & Related papers (2024-10-23T11:58:58Z) - X-TURING: Towards an Enhanced and Efficient Turing Test for Long-Term Dialogue Agents [56.64615470513102]
The Turing test examines whether AIs exhibit human-like behaviour in natural language conversations.<n>Traditional setting limits each participant to one message at a time and requires constant human participation.<n>This paper proposes textbftextscX-Turing, which enhances the original test with a textitburst dialogue pattern.
arXiv Detail & Related papers (2024-08-19T09:57:28Z) - LLM Roleplay: Simulating Human-Chatbot Interaction [52.03241266241294]
We propose a goal-oriented, persona-based method to automatically generate diverse multi-turn dialogues simulating human-chatbot interaction.
Our method can simulate human-chatbot dialogues with a high indistinguishability rate.
arXiv Detail & Related papers (2024-07-04T14:49:46Z) - BotChat: Evaluating LLMs' Capabilities of Having Multi-Turn Dialogues [72.65163468440434]
This report provides a preliminary evaluation of existing large language models for human-style multi-turn chatting.
We prompt large language models (LLMs) to generate a full multi-turn dialogue based on the ChatSEED, utterance by utterance.
We find GPT-4 can generate human-style multi-turn dialogues with impressive quality, significantly outperforms its counterparts.
arXiv Detail & Related papers (2023-10-20T16:53:51Z) - PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator [39.40718009289621]
We propose a paradigm to simulate human behavior better and explore the benefits of incorporating more human-like questions in multi-turn conversations.
Specifically, we target human questions extracted from genuine human-machine conversations as a learning goal and provide a novel user simulator called Socratic'
Our results show our response model, PlatoLM', achieves SoTA performance among LLaMA-based 7B models in MT-Bench.
arXiv Detail & Related papers (2023-08-21T06:51:56Z) - PLACES: Prompting Language Models for Social Conversation Synthesis [103.94325597273316]
We use a small set of expert-written conversations as in-context examples to synthesize a social conversation dataset using prompting.
We perform several thorough evaluations of our synthetic conversations compared to human-collected conversations.
arXiv Detail & Related papers (2023-02-07T05:48:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.