Joint Modelling of Spoken Language Understanding Tasks with Integrated
Dialog History
- URL: http://arxiv.org/abs/2305.00926v1
- Date: Mon, 1 May 2023 16:26:18 GMT
- Title: Joint Modelling of Spoken Language Understanding Tasks with Integrated
Dialog History
- Authors: Siddhant Arora, Hayato Futami, Emiru Tsunoo, Brian Yan, Shinji
Watanabe
- Abstract summary: We propose a novel model architecture that learns dialog context to jointly predict the intent, dialog act, speaker role, and emotion for the spoken utterance.
Our experiments show that our joint model achieves similar results to task-specific classifiers.
- Score: 30.20353302347147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most human interactions occur in the form of spoken conversations where the
semantic meaning of a given utterance depends on the context. Each utterance in
spoken conversation can be represented by many semantic and speaker attributes,
and there has been an interest in building Spoken Language Understanding (SLU)
systems for automatically predicting these attributes. Recent work has shown
that incorporating dialogue history can help advance SLU performance. However,
separate models are used for each SLU task, leading to an increase in inference
time and computation cost. Motivated by this, we aim to ask: can we jointly
model all the SLU tasks while incorporating context to facilitate low-latency
and lightweight inference? To answer this, we propose a novel model
architecture that learns dialog context to jointly predict the intent, dialog
act, speaker role, and emotion for the spoken utterance. Note that our joint
prediction is based on an autoregressive model and we need to decide the
prediction order of dialog attributes, which is not trivial. To mitigate the
issue, we also propose an order agnostic training method. Our experiments show
that our joint model achieves similar results to task-specific classifiers and
can effectively integrate dialog context to further improve the SLU
performance.
Related papers
- Increasing faithfulness in human-human dialog summarization with Spoken Language Understanding tasks [0.0]
We propose an exploration of how incorporating task-related information can enhance the summarization process.
Results show that integrating models with task-related information improves summary accuracy, even with varying word error rates.
arXiv Detail & Related papers (2024-09-16T08:15:35Z) - Integrating Paralinguistics in Speech-Empowered Large Language Models for Natural Conversation [46.93969003104427]
This paper introduces an extensive speech-text LLM framework, the Unified Spoken Dialog Model (USDM)
USDM is designed to generate coherent spoken responses with naturally occurring prosodic features relevant to the given input speech.
Our approach effectively generates natural-sounding spoken responses, surpassing previous and cascaded baselines.
arXiv Detail & Related papers (2024-02-08T14:35:09Z) - Towards Joint Modeling of Dialogue Response and Speech Synthesis based
on Large Language Model [8.180382743037082]
This paper explores the potential of constructing an AI spoken dialogue system that "thinks how to respond" and "thinks how to speak" simultaneously.
arXiv Detail & Related papers (2023-09-20T01:48:27Z) - Bridging Speech and Textual Pre-trained Models with Unsupervised ASR [70.61449720963235]
This work proposes a simple yet efficient unsupervised paradigm that connects speech and textual pre-trained models.
We show that unsupervised automatic speech recognition (ASR) can improve the representations from speech self-supervised models.
Notably, on spoken question answering, we reach the state-of-the-art result over the challenging NMSQA benchmark.
arXiv Detail & Related papers (2022-11-06T04:50:37Z) - SPACE-2: Tree-Structured Semi-Supervised Contrastive Pre-training for
Task-Oriented Dialog Understanding [68.94808536012371]
We propose a tree-structured pre-trained conversation model, which learns dialog representations from limited labeled dialogs and large-scale unlabeled dialog corpora.
Our method can achieve new state-of-the-art results on the DialoGLUE benchmark consisting of seven datasets and four popular dialog understanding tasks.
arXiv Detail & Related papers (2022-09-14T13:42:50Z) - Adapting Task-Oriented Dialogue Models for Email Conversations [4.45709593827781]
In this paper, we provide an effective transfer learning framework (EMToD) that allows the latest development in dialogue models to be adapted for long-form conversations.
We show that the proposed EMToD framework improves intent detection performance over pre-trained language models by 45% and over pre-trained dialogue models by 30% for task-oriented email conversations.
arXiv Detail & Related papers (2022-08-19T16:41:34Z) - KETOD: Knowledge-Enriched Task-Oriented Dialogue [77.59814785157877]
Existing studies in dialogue system research mostly treat task-oriented dialogue and chit-chat as separate domains.
We investigate how task-oriented dialogue and knowledge-grounded chit-chat can be effectively integrated into a single model.
arXiv Detail & Related papers (2022-05-11T16:01:03Z) - End-to-end Spoken Conversational Question Answering: Task, Dataset and
Model [92.18621726802726]
In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts.
We propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows.
Our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering.
arXiv Detail & Related papers (2022-04-29T17:56:59Z) - Integrating Dialog History into End-to-End Spoken Language Understanding
Systems [37.08876551722831]
We investigate the importance of dialog history and how it can be effectively integrated into end-to-end spoken language understanding systems.
While processing a spoken utterance, our proposed RNN transducer (RNN-T) based SLU model has access to its dialog history in the form of decoded transcripts and SLU labels of previous turns.
We evaluate our approach on a recently released spoken dialog data set, the HarperValleyBank corpus.
arXiv Detail & Related papers (2021-08-18T22:24:11Z) - Dialogue History Matters! Personalized Response Selectionin Multi-turn
Retrieval-based Chatbots [62.295373408415365]
We propose a personalized hybrid matching network (PHMN) for context-response matching.
Our contributions are two-fold: 1) our model extracts personalized wording behaviors from user-specific dialogue history as extra matching information.
We evaluate our model on two large datasets with user identification, i.e., personalized dialogue Corpus Ubuntu (P- Ubuntu) and personalized Weibo dataset (P-Weibo)
arXiv Detail & Related papers (2021-03-17T09:42:11Z) - SPLAT: Speech-Language Joint Pre-Training for Spoken Language
Understanding [61.02342238771685]
Spoken language understanding requires a model to analyze input acoustic signal to understand its linguistic content and make predictions.
Various pre-training methods have been proposed to learn rich representations from large-scale unannotated speech and text.
We propose a novel semi-supervised learning framework, SPLAT, to jointly pre-train the speech and language modules.
arXiv Detail & Related papers (2020-10-05T19:29:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.