Are Neural Open-Domain Dialog Systems Robust to Speech Recognition
Errors in the Dialog History? An Empirical Study
- URL: http://arxiv.org/abs/2008.07683v1
- Date: Tue, 18 Aug 2020 00:36:57 GMT
- Title: Are Neural Open-Domain Dialog Systems Robust to Speech Recognition
Errors in the Dialog History? An Empirical Study
- Authors: Karthik Gopalakrishnan, Behnam Hedayatnia, Longshaokan Wang, Yang Liu,
Dilek Hakkani-Tur
- Abstract summary: We study the effects of various types of synthetic and actual ASR hypotheses in the dialog history on TransferTransfo.
To the best of our knowledge, this is the first study to evaluate the effects of synthetic and actual ASR hypotheses on a state-of-the-art neural open-domain dialog system.
- Score: 10.636793932473426
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large end-to-end neural open-domain chatbots are becoming increasingly
popular. However, research on building such chatbots has typically assumed that
the user input is written in nature and it is not clear whether these chatbots
would seamlessly integrate with automatic speech recognition (ASR) models to
serve the speech modality. We aim to bring attention to this important question
by empirically studying the effects of various types of synthetic and actual
ASR hypotheses in the dialog history on TransferTransfo, a state-of-the-art
Generative Pre-trained Transformer (GPT) based neural open-domain dialog system
from the NeurIPS ConvAI2 challenge. We observe that TransferTransfo trained on
written data is very sensitive to such hypotheses introduced to the dialog
history during inference time. As a baseline mitigation strategy, we introduce
synthetic ASR hypotheses to the dialog history during training and observe
marginal improvements, demonstrating the need for further research into
techniques to make end-to-end open-domain chatbots fully speech-robust. To the
best of our knowledge, this is the first study to evaluate the effects of
synthetic and actual ASR hypotheses on a state-of-the-art neural open-domain
dialog system and we hope it promotes speech-robustness as an evaluation
criterion in open-domain dialog.
Related papers
- WavChat: A Survey of Spoken Dialogue Models [66.82775211793547]
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o, have captured significant attention in the speech domain.
These advanced spoken dialogue models not only comprehend audio, music, and other speech-related features, but also capture stylistic and timbral characteristics in speech.
Despite the progress in spoken dialogue systems, there is a lack of comprehensive surveys that systematically organize and analyze these systems.
arXiv Detail & Related papers (2024-11-15T04:16:45Z) - A Static and Dynamic Attention Framework for Multi Turn Dialogue Generation [37.79563028123686]
In open domain multi turn dialogue generation, it is essential to modeling the contextual semantics of the dialogue history.
Previous research had verified the effectiveness of the hierarchical recurrent encoder-decoder framework on open domain multi turn dialogue generation.
We propose a static and dynamic attention-based approach to model the dialogue history and then generate open domain multi turn dialogue responses.
arXiv Detail & Related papers (2024-10-28T06:05:34Z) - PK-Chat: Pointer Network Guided Knowledge Driven Generative Dialogue
Model [79.64376762489164]
PK-Chat is a Pointer network guided generative dialogue model, incorporating a unified pretrained language model and a pointer network over knowledge graphs.
The words generated by PK-Chat in the dialogue are derived from the prediction of word lists and the direct prediction of the external knowledge graph knowledge.
Based on the PK-Chat, a dialogue system is built for academic scenarios in the case of geosciences.
arXiv Detail & Related papers (2023-04-02T18:23:13Z) - Channel-aware Decoupling Network for Multi-turn Dialogue Comprehension [81.47133615169203]
We propose compositional learning for holistic interaction across utterances beyond the sequential contextualization from PrLMs.
We employ domain-adaptive training strategies to help the model adapt to the dialogue domains.
Experimental results show that our method substantially boosts the strong PrLM baselines in four public benchmark datasets.
arXiv Detail & Related papers (2023-01-10T13:18:25Z) - Emotion Recognition in Conversation using Probabilistic Soft Logic [17.62924003652853]
emotion recognition in conversation (ERC) is a sub-field of emotion recognition that focuses on conversations that contain two or more utterances.
We implement our approach in a framework called Probabilistic Soft Logic (PSL), a declarative templating language.
PSL provides functionality for the incorporation of results from neural models into PSL models.
We compare our method with state-of-the-art purely neural ERC systems, and see almost a 20% improvement.
arXiv Detail & Related papers (2022-07-14T23:59:06Z) - GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog.
We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups.
A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z) - Automatic Evaluation and Moderation of Open-domain Dialogue Systems [59.305712262126264]
A long standing challenge that bothers the researchers is the lack of effective automatic evaluation metrics.
This paper describes the data, baselines and results obtained for the Track 5 at the Dialogue System Technology Challenge 10 (DSTC10)
arXiv Detail & Related papers (2021-11-03T10:08:05Z) - Enhancing Self-Disclosure In Neural Dialog Models By Candidate
Re-ranking [0.7059472280274008]
Social penetration theory (SPT) proposes that communication between two people moves from shallow to deeper levels as the relationship progresses primarily through self-disclosure.
In this paper, Self-disclosure enhancement architecture (SDEA) is introduced utilizing Self-disclosure Topic Model (SDTM) to re-rank response candidates to enhance self-disclosure in single-turn responses from from the model.
arXiv Detail & Related papers (2021-09-10T20:06:27Z) - Ranking Enhanced Dialogue Generation [77.8321855074999]
How to effectively utilize the dialogue history is a crucial problem in multi-turn dialogue generation.
Previous works usually employ various neural network architectures to model the history.
This paper proposes a Ranking Enhanced Dialogue generation framework.
arXiv Detail & Related papers (2020-08-13T01:49:56Z) - Probing Neural Dialog Models for Conversational Understanding [21.76744391202041]
We analyze the internal representations learned by neural open-domain dialog systems.
Our results suggest that standard open-domain dialog systems struggle with answering questions.
We also find that the dyadic, turn-taking nature of dialog is not fully leveraged by these models.
arXiv Detail & Related papers (2020-06-07T17:32:00Z) - Neural Generation of Dialogue Response Timings [13.611050992168506]
We propose neural models that simulate the distributions of spoken response offsets.
The models are designed to be integrated into the pipeline of an incremental spoken dialogue system.
We show that human listeners consider certain response timings to be more natural based on the dialogue context.
arXiv Detail & Related papers (2020-05-18T23:00:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.