Related papers: Oh My Mistake!: Toward Realistic Dialogue State Tracking including Turnback Utterances

Oh My Mistake!: Toward Realistic Dialogue State Tracking including Turnback Utterances

URL: http://arxiv.org/abs/2108.12637v1
Date: Sat, 28 Aug 2021 12:10:50 GMT
Title: Oh My Mistake!: Toward Realistic Dialogue State Tracking including Turnback Utterances
Authors: Takyoung Kim, Yukyung Lee, Hoonsang Yoon, Pilsung Kang, Misuk Kim
Abstract summary: We study whether current benchmark datasets are sufficiently diverse to handle casual conversations in which one changes their mind. We found that injecting template-based turnback utterances significantly degrades the DST model performance. We also observed that the performance rebounds when a turnback is appropriately included in the training dataset.
Score: 1.6099403809839035
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The primary purpose of dialogue state tracking (DST), a critical component of an end-to-end conversational system, is to build a model that responds well to real-world situations. Although we often change our minds during ordinary conversations, current benchmark datasets do not adequately reflect such occurrences and instead consist of over-simplified conversations, in which no one changes their mind during a conversation. As the main question inspiring the present study,``Are current benchmark datasets sufficiently diverse to handle casual conversations in which one changes their mind?'' We found that the answer is ``No'' because simply injecting template-based turnback utterances significantly degrades the DST model performance. The test joint goal accuracy on the MultiWOZ decreased by over 5\%p when the simplest form of turnback utterance was injected. Moreover, the performance degeneration worsens when facing more complicated turnback situations. However, we also observed that the performance rebounds when a turnback is appropriately included in the training dataset, implying that the problem is not with the DST models but rather with the construction of the benchmark dataset.

Related papers

CoPrUS: Consistency Preserving Utterance Synthesis towards more realistic benchmark dialogues [0.27309692684728604]
We investigate the creation of synthetic communication errors in an automatic pipeline. We focus on three types of miscommunications that could happen in real-world dialogues but are underrepresented in the benchmark dataset. Our two-step approach uses a state-of-the-art Large Language Model (LLM) to first create the error and secondly the repairing utterance.
arXiv Detail & Related papers (2024-12-10T13:51:55Z)
Making Retrieval-Augmented Language Models Robust to Irrelevant Context [55.564789967211844]
An important desideratum of RALMs, is that retrieved information helps model performance when it is relevant. Recent work has shown that retrieval augmentation can sometimes have a negative effect on performance.
arXiv Detail & Related papers (2023-10-02T18:52:35Z)
SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented Dialogue Agents [72.42049370297849]
SpokenWOZ is a large-scale speech-text dataset for spoken TOD. Cross-turn slot and reasoning slot detection are new challenges for SpokenWOZ.
arXiv Detail & Related papers (2023-05-22T13:47:51Z)
Stabilized In-Context Learning with Pre-trained Language Models for Few Shot Dialogue State Tracking [57.92608483099916]
Large pre-trained language models (PLMs) have shown impressive unaided performance across many NLP tasks. For more complex tasks such as dialogue state tracking (DST), designing prompts that reliably convey the desired intent is nontrivial. We introduce a saliency model to limit dialogue text length, allowing us to include more exemplars per query.
arXiv Detail & Related papers (2023-02-12T15:05:10Z)
Realistic Conversational Question Answering with Answer Selection based on Calibrated Confidence and Uncertainty Measurement [54.55643652781891]
Conversational Question Answering (ConvQA) models aim at answering a question with its relevant paragraph and previous question-answer pairs that occurred during conversation multiple times. We propose to filter out inaccurate answers in the conversation history based on their estimated confidences and uncertainties from the ConvQA model. We validate our models, Answer Selection-based realistic Conversation Question Answering, on two standard ConvQA datasets.
arXiv Detail & Related papers (2023-02-10T09:42:07Z)
TANet: Thread-Aware Pretraining for Abstractive Conversational Summarization [27.185068253347257]
We build a large-scale (11M) pretraining dataset called RCS based on the multi-person discussions in the Reddit community. We then present TANet, a thread-aware Transformer-based network. Unlike the existing pre-trained models that treat a conversation as a sequence of sentences, we argue that the inherent contextual dependency plays an essential role in understanding the entire conversation.
arXiv Detail & Related papers (2022-04-09T16:08:46Z)
In-Context Learning for Few-Shot Dialogue State Tracking [55.91832381893181]
We propose an in-context (IC) learning framework for few-shot dialogue state tracking (DST) A large pre-trained language model (LM) takes a test instance and a few annotated examples as input, and directly decodes the dialogue states without any parameter updates. This makes the LM more flexible and scalable compared to prior few-shot DST work when adapting to new domains and scenarios.
arXiv Detail & Related papers (2022-03-16T11:58:24Z)
Improving Longer-range Dialogue State Tracking [22.606650177804966]
Dialogue state tracking (DST) is a pivotal component in task-oriented dialogue systems. In this paper, we aim to improve the overall performance of DST with a special focus on handling longer dialogues.
arXiv Detail & Related papers (2021-02-27T02:44:28Z)
CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers [92.5628632009802]
We propose controllable counterfactuals (CoCo) to bridge the gap and evaluate dialogue state tracking (DST) models on novel scenarios. CoCo generates novel conversation scenarios in two steps: (i) counterfactual goal generation at turn-level by dropping and adding slots followed by replacing slot values, and (ii) counterfactual conversation generation that is conditioned on (i) and consistent with the dialogue flow. Human evaluations show that COCO-generated conversations perfectly reflect the underlying user goal with more than 95% accuracy and are as human-like as the original conversations.
arXiv Detail & Related papers (2020-10-24T09:39:35Z)
Dual Learning for Dialogue State Tracking [44.679185483585364]
Dialogue state tracking (DST) is to estimate the dialogue state at each turn. Due to the dependency on complicated dialogue history contexts, DST data annotation is more expensive than single-sentence language understanding. We propose a novel dual-learning framework to make full use of unlabeled data.
arXiv Detail & Related papers (2020-09-22T10:15:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.