Related papers: REALTALK: A 21-Day Real-World Dataset for Long-Term Conversation

REALTALK: A 21-Day Real-World Dataset for Long-Term Conversation

URL: http://arxiv.org/abs/2502.13270v1
Date: Tue, 18 Feb 2025 20:29:01 GMT
Title: REALTALK: A 21-Day Real-World Dataset for Long-Term Conversation
Authors: Dong-Ho Lee, Adyasha Maharana, Jay Pujara, Xiang Ren, Francesco Barbieri,
Abstract summary: We introduce REALTALK, a 21-day corpus of authentic messaging app dialogues.<n>We compare EI attributes and persona consistency to understand the challenges posed by real-world dialogues.<n>Our findings reveal that models struggle to simulate a user solely from dialogue history, while fine-tuning on specific user chats improves persona emulation.
Score: 51.97224538045096
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Long-term, open-domain dialogue capabilities are essential for chatbots aiming to recall past interactions and demonstrate emotional intelligence (EI). Yet, most existing research relies on synthetic, LLM-generated data, leaving open questions about real-world conversational patterns. To address this gap, we introduce REALTALK, a 21-day corpus of authentic messaging app dialogues, providing a direct benchmark against genuine human interactions. We first conduct a dataset analysis, focusing on EI attributes and persona consistency to understand the unique challenges posed by real-world dialogues. By comparing with LLM-generated conversations, we highlight key differences, including diverse emotional expressions and variations in persona stability that synthetic dialogues often fail to capture. Building on these insights, we introduce two benchmark tasks: (1) persona simulation where a model continues a conversation on behalf of a specific user given prior dialogue context; and (2) memory probing where a model answers targeted questions requiring long-term memory of past interactions. Our findings reveal that models struggle to simulate a user solely from dialogue history, while fine-tuning on specific user chats improves persona emulation. Additionally, existing models face significant challenges in recalling and leveraging long-term context within real-world conversations.

Related papers

The ICASSP 2026 HumDial Challenge: Benchmarking Human-like Spoken Dialogue Systems in the LLM Era [95.35748535806744]
We launch the first Human-like Spoken Dialogue Systems Challenge (HumDial) at ICASSP 2026.<n>This paper summarizes the dataset, track configurations, and the final results.
arXiv Detail & Related papers (2026-01-09T06:32:30Z)
FLEXI: Benchmarking Full-duplex Human-LLM Speech Interaction [49.83226596963294]
Speech-computer human interaction enables real-time spoken dialogue systems.<n>Modelling and benchmarking these models remains a fundamental challenge.<n>We introduce FLEXI, the first benchmark for full-human spoken interaction.
arXiv Detail & Related papers (2025-09-26T11:57:42Z)
DialogueForge: LLM Simulation of Human-Chatbot Dialogue [7.038493120049631]
We propose DialogueForge as a framework for generating AI-simulated conversations in human-chatbot style.<n>To each generated conversation, DialogueForge uses seed prompts extracted from real human-chatbot interactions.<n>We evaluate the quality of the simulated conversations and compare different models using the UniEval and GTEval evaluation protocols.
arXiv Detail & Related papers (2025-07-21T16:08:19Z)
Aligning Spoken Dialogue Models from User Interactions [55.192134724622235]
We propose a novel preference alignment framework to improve spoken dialogue models on realtime conversations from user interactions.<n>We create a dataset of more than 150,000 preference pairs from raw multi-turn speech conversations annotated with AI feedback.<n>Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.
arXiv Detail & Related papers (2025-06-26T16:45:20Z)
PaRT: Enhancing Proactive Social Chatbots with Personalized Real-Time Retrieval [31.567993758708393]
PaRT is a novel framework enabling context-aware proactive dialogues for social chatbots through personalized real-time retrieval and generation. Our approach has been running stably in a real-world production environment for more than 30 days, achieving a 21.77% improvement in the average duration of dialogues.
arXiv Detail & Related papers (2025-04-29T10:51:58Z)
Interpersonal Memory Matters: A New Task for Proactive Dialogue Utilizing Conversational History [13.389395397698035]
We introduce a novel task named Memory-aware Proactive Dialogue (MapDia) By the task, we then propose an automatic data construction method and create the first Chinese Memory-aware Proactive dataset (ChMapData) Furthermore, we introduce a joint framework based on Retrieval Augmented Generation (RAG), featuring three modules: Topic Summarization, Topic Retrieval, and Proactive Topic-shifting Detection and Generation.
arXiv Detail & Related papers (2025-03-07T05:19:17Z)
Self-Directed Turing Test for Large Language Models [56.64615470513102]
The Turing test examines whether AIs can exhibit human-like behaviour in natural language conversations. Traditional Turing tests adopt a rigid dialogue format where each participant sends only one message each time. This paper proposes the Self-Directed Turing Test, which extends the original test with a burst dialogue format.
arXiv Detail & Related papers (2024-08-19T09:57:28Z)
DialSim: A Real-Time Simulator for Evaluating Long-Term Multi-Party Dialogue Understanding of Conversation Systems [13.915753261117901]
We introduce DialSim, a real-time dialogue simulator.<n>In this simulator, a conversation system is assigned the role of a character from popular TV shows.<n>Key features of DialSim include assessing the system's ability to respond within a reasonable time limit.
arXiv Detail & Related papers (2024-06-19T01:37:10Z)
Dialogue Agents 101: A Beginner's Guide to Critical Ingredients for Designing Effective Conversational Systems [29.394466123216258]
This study provides a comprehensive overview of the primary characteristics of a dialogue agent, their corresponding open-domain datasets, and the methods used to benchmark these datasets. We propose UNIT, a UNified dIalogue dataseT constructed from conversations of existing datasets for different dialogue tasks capturing the nuances for each of them.
arXiv Detail & Related papers (2023-07-14T10:05:47Z)
Does Collaborative Human-LM Dialogue Generation Help Information Extraction from Human Dialogues? [55.28340832822234]
Problem-solving human dialogues in real applications can be much more complex than existing Wizard-of-Oz collections. We introduce a human-in-the-loop dialogue generation framework capable of synthesizing realistic dialogues.
arXiv Detail & Related papers (2023-07-13T20:02:50Z)
Back to the Future: Bidirectional Information Decoupling Network for Multi-turn Dialogue Modeling [80.51094098799736]
We propose Bidirectional Information Decoupling Network (BiDeN) as a universal dialogue encoder. BiDeN explicitly incorporates both the past and future contexts and can be generalized to a wide range of dialogue-related tasks. Experimental results on datasets of different downstream tasks demonstrate the universality and effectiveness of our BiDeN.
arXiv Detail & Related papers (2022-04-18T03:51:46Z)
Commonsense-Focused Dialogues for Response Generation: An Empirical Study [39.49727190159279]
We present an empirical study of commonsense in dialogue response generation. We first auto-extract commonsensical dialogues from existing dialogue datasets by leveraging ConceptNet. We then collect a new dialogue dataset with 25K dialogues aimed at exhibiting social commonsense in an interactive setting.
arXiv Detail & Related papers (2021-09-14T04:32:09Z)
Dialogue History Matters! Personalized Response Selectionin Multi-turn Retrieval-based Chatbots [62.295373408415365]
We propose a personalized hybrid matching network (PHMN) for context-response matching. Our contributions are two-fold: 1) our model extracts personalized wording behaviors from user-specific dialogue history as extra matching information. We evaluate our model on two large datasets with user identification, i.e., personalized dialogue Corpus Ubuntu (P- Ubuntu) and personalized Weibo dataset (P-Weibo)
arXiv Detail & Related papers (2021-03-17T09:42:11Z)
TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue [113.45485470103762]
In this work, we unify nine human-human and multi-turn task-oriented dialogue datasets for language modeling. To better model dialogue behavior during pre-training, we incorporate user and system tokens into the masked language modeling.
arXiv Detail & Related papers (2020-04-15T04:09:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.