From What to Respond to When to Respond: Timely Response Generation for Open-domain Dialogue Agents
- URL: http://arxiv.org/abs/2506.14285v1
- Date: Tue, 17 Jun 2025 07:56:32 GMT
- Title: From What to Respond to When to Respond: Timely Response Generation for Open-domain Dialogue Agents
- Authors: Seongbo Jang, Minjin Jeon, Jaehoon Lee, Seonghyeon Lee, Dongha Lee, Hwanjo Yu,
- Abstract summary: TimelyChat benchmark evaluates the capabilities of language models to predict appropriate time intervals and generate time-conditioned responses.<n>We construct a large-scale training dataset by leveraging unlabeled event knowledge from a temporal commonsense knowledge graph.<n>We then train Timer, a dialogue agent designed to proactively predict time intervals and generate timely responses that align with those intervals.
- Score: 26.437011114518917
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While research on dialogue response generation has primarily focused on generating coherent responses conditioning on textual context, the critical question of when to respond grounded on the temporal context remains underexplored. To bridge this gap, we propose a novel task called timely dialogue response generation and introduce the TimelyChat benchmark, which evaluates the capabilities of language models to predict appropriate time intervals and generate time-conditioned responses. Additionally, we construct a large-scale training dataset by leveraging unlabeled event knowledge from a temporal commonsense knowledge graph and employing a large language model (LLM) to synthesize 55K event-driven dialogues. We then train Timer, a dialogue agent designed to proactively predict time intervals and generate timely responses that align with those intervals. Experimental results show that Timer outperforms prompting-based LLMs and other fine-tuned baselines in both turn-level and dialogue-level evaluations. We publicly release our data, model, and code.
Related papers
- Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities [93.09944267871163]
FullDuplexBench is a benchmark that systematically evaluates key interactive behaviors.<n>By releasing our benchmark code we aim to advance spoken dialogue modeling and the development of more natural and engaging SDMs.
arXiv Detail & Related papers (2025-03-06T18:59:16Z) - MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational Agents [23.98067169669452]
MTPChat is a time-aware persona dialogue dataset that integrates linguistic, visual, and temporal elements within dialogue and persona memory.<n>We propose two time-sensitive tasks: Temporal Next Response Prediction (TNRP) and Temporal Grounding Memory Prediction (TGMP)<n>We present an innovative framework featuring an adaptive temporal module to effectively integrate multimodal streams and capture temporal dependencies.
arXiv Detail & Related papers (2025-02-09T13:00:53Z) - TReMu: Towards Neuro-Symbolic Temporal Reasoning for LLM-Agents with Memory in Multi-Session Dialogues [13.638344516302851]
Temporal reasoning in multi-session dialogues presents a significant challenge which has been under-studied.<n>We introduce an approach to construct a new benchmark by augmenting dialogues from LoCoMo and creating multi-choice QAs.<n>We also present TReMu, a new framework aimed at enhancing the temporal reasoning capabilities of LLM-agents.
arXiv Detail & Related papers (2025-02-03T18:58:19Z) - X-TURING: Towards an Enhanced and Efficient Turing Test for Long-Term Dialogue Agents [56.64615470513102]
The Turing test examines whether AIs exhibit human-like behaviour in natural language conversations.<n>Traditional setting limits each participant to one message at a time and requires constant human participation.<n>This paper proposes textbftextscX-Turing, which enhances the original test with a textitburst dialogue pattern.
arXiv Detail & Related papers (2024-08-19T09:57:28Z) - Evaluating Very Long-Term Conversational Memory of LLM Agents [95.84027826745609]
We introduce a machine-human pipeline to generate high-quality, very long-term dialogues.
We equip each agent with the capability of sharing and reacting to images.
The generated conversations are verified and edited by human annotators for long-range consistency.
arXiv Detail & Related papers (2024-02-27T18:42:31Z) - Mind the Gap Between Conversations for Improved Long-Term Dialogue
Generation [21.109006148673846]
GapChat is a multi-session dialogue dataset in which the time between each session varies.
While the dataset is constructed in real-time, progress on events in speakers' lives is simulated in order to create realistic dialogues occurring across a long timespan.
We show that time-aware models perform better in metrics that judge the relevance of the chosen topics and the information gained from the conversation.
arXiv Detail & Related papers (2023-10-24T00:12:38Z) - PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded
Dialogue Systems [59.1250765143521]
Current knowledge-grounded dialogue systems often fail to align the generated responses with human-preferred qualities.
We propose Polished & Informed Candidate Scoring (PICK), a generation re-scoring framework.
We demonstrate the effectiveness of PICK in generating responses that are more faithful while keeping them relevant to the dialogue history.
arXiv Detail & Related papers (2023-09-19T08:27:09Z) - TIMEDIAL: Temporal Commonsense Reasoning in Dialog [43.24596551545824]
We present the first study to investigate pre-trained language models for their temporal reasoning capabilities in dialogs.
We formulate TIME-DIAL as a multiple-choice cloze task with over 1.1K carefully curated dialogs.
Empirical results demonstrate that even the best performing models struggle on this task compared to humans.
arXiv Detail & Related papers (2021-06-08T17:59:21Z) - Generating Dialogue Responses from a Semantic Latent Space [75.18449428414736]
We propose an alternative to the end-to-end classification on vocabulary.
We learn the pair relationship between the prompts and responses as a regression task on a latent space.
Human evaluation showed that learning the task on a continuous space can generate responses that are both relevant and informative.
arXiv Detail & Related papers (2020-10-04T19:06:16Z) - Learning an Effective Context-Response Matching Model with
Self-Supervised Tasks for Retrieval-based Dialogues [88.73739515457116]
We introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination.
We jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner.
Experiment results indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection.
arXiv Detail & Related papers (2020-09-14T08:44:46Z) - Learning an Unreferenced Metric for Online Dialogue Evaluation [53.38078951628143]
We propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances.
We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.
arXiv Detail & Related papers (2020-05-01T20:01:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.