Conversations Gone Awry, But Then? Evaluating Conversational Forecasting Models
- URL: http://arxiv.org/abs/2507.19470v1
- Date: Fri, 25 Jul 2025 17:55:13 GMT
- Title: Conversations Gone Awry, But Then? Evaluating Conversational Forecasting Models
- Authors: Son Quoc Tran, Tushaar Gangavarapu, Nicholas Chernogor, Jonathan P. Chang, Cristian Danescu-Niculescu-Mizil,
- Abstract summary: Recent work on developing models with this predictive capacity has focused on the Conversations Gone Awry (CGA) task.<n>We revisit this task and introduce the first uniform evaluation framework, creating a benchmark that enables comparisons between different architectures.<n>Our framework also introduces a novel metric that captures a model's ability to revise its forecast as the conversation progresses.
- Score: 5.582408085157498
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We often rely on our intuition to anticipate the direction of a conversation. Endowing automated systems with similar foresight can enable them to assist human-human interactions. Recent work on developing models with this predictive capacity has focused on the Conversations Gone Awry (CGA) task: forecasting whether an ongoing conversation will derail. In this work, we revisit this task and introduce the first uniform evaluation framework, creating a benchmark that enables direct and reliable comparisons between different architectures. This allows us to present an up-to-date overview of the current progress in CGA models, in light of recent advancements in language modeling. Our framework also introduces a novel metric that captures a model's ability to revise its forecast as the conversation progresses.
Related papers
- Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities [93.09944267871163]
FullDuplexBench is a benchmark that systematically evaluates key interactive behaviors.<n>By releasing our benchmark code we aim to advance spoken dialogue modeling and the development of more natural and engaging SDMs.
arXiv Detail & Related papers (2025-03-06T18:59:16Z) - Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics [54.03209351287654]
We propose a novel evaluation protocol that can assess spoken dialog system's turn-taking capabilities.<n>We present the first comprehensive user study that evaluates existing spoken dialogue systems on their ability to perform turn-taking events.<n>We will open source our evaluation platform to promote the development of advanced conversational AI systems.
arXiv Detail & Related papers (2025-03-03T04:46:04Z) - How Did We Get Here? Summarizing Conversation Dynamics [4.644319899528183]
We introduce the task of summarizing the dynamics of conversations by constructing a dataset of human-written summaries.
We evaluate whether such summaries can capture the trajectory of conversations via an established downstream task.
We show that they help both humans and automated systems with this forecasting task.
arXiv Detail & Related papers (2024-04-29T18:00:03Z) - MindDial: Belief Dynamics Tracking with Theory-of-Mind Modeling for Situated Neural Dialogue Generation [62.44907105496227]
MindDial is a novel conversational framework that can generate situated free-form responses with theory-of-mind modeling.
We introduce an explicit mind module that can track the speaker's belief and the speaker's prediction of the listener's belief.
Our framework is applied to both prompting and fine-tuning-based models, and is evaluated across scenarios involving both common ground alignment and negotiation.
arXiv Detail & Related papers (2023-06-27T07:24:32Z) - Conversation Derailment Forecasting with Graph Convolutional Networks [6.251188655534379]
We propose a novel model based on a graph convolutional neural network that considers dialogue user dynamics and the influence of public perception on conversation utterances.
Our model effectively captures conversation dynamics and outperforms the state-of-the-art models on the CGA and CMV benchmark datasets by 1.5% and 1.7%, respectively.
arXiv Detail & Related papers (2023-06-22T15:40:59Z) - Improving a sequence-to-sequence nlp model using a reinforcement
learning policy algorithm [0.0]
Current neural network models of dialogue generation show great promise for generating answers for chatty agents.
But they are short-sighted in that they predict utterances one at a time while disregarding their impact on future outcomes.
This work commemorates a preliminary step toward developing a neural conversational model based on the long-term success of dialogues.
arXiv Detail & Related papers (2022-12-28T22:46:57Z) - Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue [92.01165203498299]
Embodied dialogue instruction following requires an agent to complete a complex sequence of tasks from a natural language exchange.
This paper argues that imitation learning (IL) and related low-level metrics are actually misleading and do not align with the goals of embodied dialogue research.
arXiv Detail & Related papers (2022-10-10T05:51:40Z) - GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog.
We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups.
A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z) - You Mostly Walk Alone: Analyzing Feature Attribution in Trajectory
Prediction [52.442129609979794]
Recent deep learning approaches for trajectory prediction show promising performance.
It remains unclear which features such black-box models actually learn to use for making predictions.
This paper proposes a procedure that quantifies the contributions of different cues to model performance.
arXiv Detail & Related papers (2021-10-11T14:24:15Z) - A Neural Conversation Generation Model via Equivalent Shared Memory
Investigation [39.922967513749654]
We propose a novel reading and memory framework called Deep Reading Memory Network (DRMN)
DRMN is capable of remembering useful information of similar conversations for improving utterance generation.
We apply our model to two large-scale conversation datasets of justice and e-commerce fields.
arXiv Detail & Related papers (2021-08-20T13:20:14Z) - The Adapter-Bot: All-In-One Controllable Conversational Model [66.48164003532484]
We propose a dialogue model that uses a fixed backbone model such as DialGPT and triggers on-demand dialogue skills via different adapters.
Depending on the skills, the model is able to process multiple knowledge types, such as text, tables, and emphatic responses.
We evaluate our model using automatic evaluation by comparing it with existing state-of-the-art conversational models.
arXiv Detail & Related papers (2020-08-28T10:59:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.