Oh My Mistake!: Toward Realistic Dialogue State Tracking including
Turnback Utterances
- URL: http://arxiv.org/abs/2108.12637v1
- Date: Sat, 28 Aug 2021 12:10:50 GMT
- Title: Oh My Mistake!: Toward Realistic Dialogue State Tracking including
Turnback Utterances
- Authors: Takyoung Kim, Yukyung Lee, Hoonsang Yoon, Pilsung Kang, Misuk Kim
- Abstract summary: We study whether current benchmark datasets are sufficiently diverse to handle casual conversations in which one changes their mind.
We found that injecting template-based turnback utterances significantly degrades the DST model performance.
We also observed that the performance rebounds when a turnback is appropriately included in the training dataset.
- Score: 1.6099403809839035
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The primary purpose of dialogue state tracking (DST), a critical component of
an end-to-end conversational system, is to build a model that responds well to
real-world situations. Although we often change our minds during ordinary
conversations, current benchmark datasets do not adequately reflect such
occurrences and instead consist of over-simplified conversations, in which no
one changes their mind during a conversation. As the main question inspiring
the present study,``Are current benchmark datasets sufficiently diverse to
handle casual conversations in which one changes their mind?'' We found that
the answer is ``No'' because simply injecting template-based turnback
utterances significantly degrades the DST model performance. The test joint
goal accuracy on the MultiWOZ decreased by over 5\%p when the simplest form of
turnback utterance was injected. Moreover, the performance degeneration worsens
when facing more complicated turnback situations. However, we also observed
that the performance rebounds when a turnback is appropriately included in the
training dataset, implying that the problem is not with the DST models but
rather with the construction of the benchmark dataset.
Related papers
- Making Retrieval-Augmented Language Models Robust to Irrelevant Context [55.564789967211844]
An important desideratum of RALMs, is that retrieved information helps model performance when it is relevant.
Recent work has shown that retrieval augmentation can sometimes have a negative effect on performance.
arXiv Detail & Related papers (2023-10-02T18:52:35Z) - SpokenWOZ: A Large-Scale Speech-Text Benchmark for Spoken Task-Oriented
Dialogue Agents [72.42049370297849]
SpokenWOZ is a large-scale speech-text dataset for spoken TOD.
Cross-turn slot and reasoning slot detection are new challenges for SpokenWOZ.
arXiv Detail & Related papers (2023-05-22T13:47:51Z) - Stabilized In-Context Learning with Pre-trained Language Models for Few
Shot Dialogue State Tracking [57.92608483099916]
Large pre-trained language models (PLMs) have shown impressive unaided performance across many NLP tasks.
For more complex tasks such as dialogue state tracking (DST), designing prompts that reliably convey the desired intent is nontrivial.
We introduce a saliency model to limit dialogue text length, allowing us to include more exemplars per query.
arXiv Detail & Related papers (2023-02-12T15:05:10Z) - Realistic Conversational Question Answering with Answer Selection based
on Calibrated Confidence and Uncertainty Measurement [54.55643652781891]
Conversational Question Answering (ConvQA) models aim at answering a question with its relevant paragraph and previous question-answer pairs that occurred during conversation multiple times.
We propose to filter out inaccurate answers in the conversation history based on their estimated confidences and uncertainties from the ConvQA model.
We validate our models, Answer Selection-based realistic Conversation Question Answering, on two standard ConvQA datasets.
arXiv Detail & Related papers (2023-02-10T09:42:07Z) - TANet: Thread-Aware Pretraining for Abstractive Conversational
Summarization [27.185068253347257]
We build a large-scale (11M) pretraining dataset called RCS based on the multi-person discussions in the Reddit community.
We then present TANet, a thread-aware Transformer-based network.
Unlike the existing pre-trained models that treat a conversation as a sequence of sentences, we argue that the inherent contextual dependency plays an essential role in understanding the entire conversation.
arXiv Detail & Related papers (2022-04-09T16:08:46Z) - In-Context Learning for Few-Shot Dialogue State Tracking [55.91832381893181]
We propose an in-context (IC) learning framework for few-shot dialogue state tracking (DST)
A large pre-trained language model (LM) takes a test instance and a few annotated examples as input, and directly decodes the dialogue states without any parameter updates.
This makes the LM more flexible and scalable compared to prior few-shot DST work when adapting to new domains and scenarios.
arXiv Detail & Related papers (2022-03-16T11:58:24Z) - Improving Longer-range Dialogue State Tracking [22.606650177804966]
Dialogue state tracking (DST) is a pivotal component in task-oriented dialogue systems.
In this paper, we aim to improve the overall performance of DST with a special focus on handling longer dialogues.
arXiv Detail & Related papers (2021-02-27T02:44:28Z) - CoCo: Controllable Counterfactuals for Evaluating Dialogue State
Trackers [92.5628632009802]
We propose controllable counterfactuals (CoCo) to bridge the gap and evaluate dialogue state tracking (DST) models on novel scenarios.
CoCo generates novel conversation scenarios in two steps: (i) counterfactual goal generation at turn-level by dropping and adding slots followed by replacing slot values, and (ii) counterfactual conversation generation that is conditioned on (i) and consistent with the dialogue flow.
Human evaluations show that COCO-generated conversations perfectly reflect the underlying user goal with more than 95% accuracy and are as human-like as the original conversations.
arXiv Detail & Related papers (2020-10-24T09:39:35Z) - Dual Learning for Dialogue State Tracking [44.679185483585364]
Dialogue state tracking (DST) is to estimate the dialogue state at each turn.
Due to the dependency on complicated dialogue history contexts, DST data annotation is more expensive than single-sentence language understanding.
We propose a novel dual-learning framework to make full use of unlabeled data.
arXiv Detail & Related papers (2020-09-22T10:15:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.