Related papers: Multimodal Contextual Dialogue Breakdown Detection for Conversational AI Models

Multimodal Contextual Dialogue Breakdown Detection for Conversational AI Models

URL: http://arxiv.org/abs/2404.08156v1
Date: Thu, 11 Apr 2024 23:09:18 GMT
Title: Multimodal Contextual Dialogue Breakdown Detection for Conversational AI Models
Authors: Md Messal Monem Miah, Ulie Schnaithmann, Arushi Raghuvanshi, Youngseo Son,
Abstract summary: We introduce a Multimodal Contextual Dialogue Breakdown (MultConDB) model. This model significantly outperforms other known best models by achieving an F1 of 69.27.
Score: 1.4199474167684119
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Detecting dialogue breakdown in real time is critical for conversational AI systems, because it enables taking corrective action to successfully complete a task. In spoken dialog systems, this breakdown can be caused by a variety of unexpected situations including high levels of background noise, causing STT mistranscriptions, or unexpected user flows. In particular, industry settings like healthcare, require high precision and high flexibility to navigate differently based on the conversation history and dialogue states. This makes it both more challenging and more critical to accurately detect dialog breakdown. To accurately detect breakdown, we found it requires processing audio inputs along with downstream NLP model inferences on transcribed text in real time. In this paper, we introduce a Multimodal Contextual Dialogue Breakdown (MultConDB) model. This model significantly outperforms other known best models by achieving an F1 of 69.27.

Related papers

Aligning Spoken Dialogue Models from User Interactions [55.192134724622235]
We propose a novel preference alignment framework to improve spoken dialogue models on realtime conversations from user interactions.<n>We create a dataset of more than 150,000 preference pairs from raw multi-turn speech conversations annotated with AI feedback.<n>Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.
arXiv Detail & Related papers (2025-06-26T16:45:20Z)
Towards Robust Dialogue Breakdown Detection: Addressing Disruptors in Large Language Models with Self-Guided Reasoning [30.13634341221476]
Large language models (LLMs) are rapidly changing various domains. This paper addresses the challenge of detecting and mitigating dialogue breakdowns within LLM-driven systems. We propose an approach that combines specialized fine-tuning with advanced prompting strategies.
arXiv Detail & Related papers (2025-04-26T07:51:05Z)
Real-Time Textless Dialogue Generation [23.456302461693053]
We propose a real-time, textless spoken dialogue generation model (RTTL-DG)<n>Our system enables fluid turn-taking and generates responses with minimal delay by processing streaming spoken conversation directly.<n>Our model incorporates backchannels, filters, laughter, and other paralinguistic signals, which are often absent in cascaded dialogue systems.
arXiv Detail & Related papers (2025-01-08T23:21:43Z)
WavChat: A Survey of Spoken Dialogue Models [66.82775211793547]
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o, have captured significant attention in the speech domain. These advanced spoken dialogue models not only comprehend audio, music, and other speech-related features, but also capture stylistic and timbral characteristics in speech. Despite the progress in spoken dialogue systems, there is a lack of comprehensive surveys that systematically organize and analyze these systems.
arXiv Detail & Related papers (2024-11-15T04:16:45Z)
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation [53.7173034249361]
End-to-end GPT-based model OmniFlatten capable of effectively modeling complex behaviors inherent natural conversations with low latency. Our approach offers a simple modeling technique and a promising research direction for developing efficient and natural end-to-end full- spoken dialogue systems.
arXiv Detail & Related papers (2024-10-23T11:58:58Z)
Are cascade dialogue state tracking models speaking out of turn in spoken dialogues? [1.786898113631979]
This paper proposes a comprehensive analysis of the errors of state of the art systems in complex settings such as Dialogue State Tracking. Based on spoken MultiWoz, we identify that errors on non-categorical slots' values are essential to address in order to bridge the gap between spoken and chat-based dialogue systems.
arXiv Detail & Related papers (2023-11-03T08:45:22Z)
Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation [27.926862030684926]
We introduce a novel conversational ASR system, extending the Conformer encoder-decoder model with cross-modal conversational representation. Our approach combines pre-trained speech and text models through a specialized encoder and a modal-level mask input. By introducing both cross-modal and conversational representations into the decoder, our model retains context over longer sentences without information loss.
arXiv Detail & Related papers (2023-10-22T11:57:33Z)
Multi-turn Dialogue Comprehension from a Topic-aware Perspective [70.37126956655985]
This paper proposes to model multi-turn dialogues from a topic-aware perspective. We use a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way. We also present a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements.
arXiv Detail & Related papers (2023-09-18T11:03:55Z)
Stabilized In-Context Learning with Pre-trained Language Models for Few Shot Dialogue State Tracking [57.92608483099916]
Large pre-trained language models (PLMs) have shown impressive unaided performance across many NLP tasks. For more complex tasks such as dialogue state tracking (DST), designing prompts that reliably convey the desired intent is nontrivial. We introduce a saliency model to limit dialogue text length, allowing us to include more exemplars per query.
arXiv Detail & Related papers (2023-02-12T15:05:10Z)
Back to the Future: Bidirectional Information Decoupling Network for Multi-turn Dialogue Modeling [80.51094098799736]
We propose Bidirectional Information Decoupling Network (BiDeN) as a universal dialogue encoder. BiDeN explicitly incorporates both the past and future contexts and can be generalized to a wide range of dialogue-related tasks. Experimental results on datasets of different downstream tasks demonstrate the universality and effectiveness of our BiDeN.
arXiv Detail & Related papers (2022-04-18T03:51:46Z)
Response Generation with Context-Aware Prompt Learning [19.340498579331555]
We present a novel approach for pre-trained dialogue modeling that casts the dialogue generation problem as a prompt-learning task. Instead of fine-tuning on limited dialogue data, our approach, DialogPrompt, learns continuous prompt embeddings optimized for dialogue contexts. Our approach significantly outperforms the fine-tuning baseline and the generic prompt-learning methods.
arXiv Detail & Related papers (2021-11-04T05:40:13Z)
Smoothing Dialogue States for Open Conversational Machine Reading [70.83783364292438]
We propose an effective gating strategy by smoothing the two dialogue states in only one decoder and bridge decision making and question generation. Experiments on the OR-ShARC dataset show the effectiveness of our method, which achieves new state-of-the-art results.
arXiv Detail & Related papers (2021-08-28T08:04:28Z)
Hierarchical Summarization for Longform Spoken Dialog [1.995792341399967]
Despite the pervasiveness of spoken dialog, automated speech understanding and quality information extraction remains markedly poor. Compared to understanding text, auditory communication poses many additional challenges such as speaker disfluencies, informal prose styles, and lack of structure. We propose a two stage ASR and text summarization pipeline and propose a set of semantic segmentation and merging algorithms to resolve these speech modeling challenges.
arXiv Detail & Related papers (2021-08-21T23:31:31Z)
TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue [113.45485470103762]
In this work, we unify nine human-human and multi-turn task-oriented dialogue datasets for language modeling. To better model dialogue behavior during pre-training, we incorporate user and system tokens into the masked language modeling.
arXiv Detail & Related papers (2020-04-15T04:09:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.