Dialog act guided contextual adapter for personalized speech recognition
- URL: http://arxiv.org/abs/2303.17799v1
- Date: Fri, 31 Mar 2023 05:13:44 GMT
- Title: Dialog act guided contextual adapter for personalized speech recognition
- Authors: Feng-Ju Chang, Thejaswi Muniyappa, Kanthashree Mysore Sathyendra, Kai
Wei, Grant P. Strimel, Ross McGowan
- Abstract summary: Personalization in multi-turn dialogs has been a long standing challenge for end-to-end automatic speech recognition (E2E ASR) models.
Recent work on contextual adapters has tackled rare word recognition using user catalogs.
We propose a dialog act guided contextual adapter network.
- Score: 9.672512327395435
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personalization in multi-turn dialogs has been a long standing challenge for
end-to-end automatic speech recognition (E2E ASR) models. Recent work on
contextual adapters has tackled rare word recognition using user catalogs. This
adaptation, however, does not incorporate an important cue, the dialog act,
which is available in a multi-turn dialog scenario. In this work, we propose a
dialog act guided contextual adapter network. Specifically, it leverages dialog
acts to select the most relevant user catalogs and creates queries based on
both -- the audio as well as the semantic relationship between the carrier
phrase and user catalogs to better guide the contextual biasing. On industrial
voice assistant datasets, our model outperforms both the baselines - dialog act
encoder-only model, and the contextual adaptation, leading to the most
improvement over the no-context model: 58% average relative word error rate
reduction (WERR) in the multi-turn dialog scenario, in comparison to the
prior-art contextual adapter, which has achieved 39% WERR over the no-context
model.
Related papers
- Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation [55.043492250775294]
We introduce a novel Face-to-Face spoken dialogue model.
It processes audio-visual speech from user input and generates audio-visual speech as the response.
We also introduce MultiDialog, the first large-scale multimodal spoken dialogue corpus.
arXiv Detail & Related papers (2024-06-12T04:48:36Z) - Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue [13.774377524019723]
We propose an efficient Multi-round Interactive Dialogue Tuning (Midi-Tuning) framework.
It models the agent and user individually with two adapters built upon large language models.
Our framework performs superior to traditional fine-tuning and harbors the tremendous potential for improving dialogue consistency.
arXiv Detail & Related papers (2024-02-10T14:52:52Z) - Multi-User MultiWOZ: Task-Oriented Dialogues among Multiple Users [51.34484827552774]
We release the Multi-User MultiWOZ dataset: task-oriented dialogues among two users and one agent.
These dialogues reflect interesting dynamics of collaborative decision-making in task-oriented scenarios.
We propose a novel task of multi-user contextual query rewriting: to rewrite a task-oriented chat between two users as a concise task-oriented query.
arXiv Detail & Related papers (2023-10-31T14:12:07Z) - Contextual Data Augmentation for Task-Oriented Dialog Systems [8.085645180329417]
We develop a novel dialog augmentation model that generates a user turn, conditioning on full dialog context.
With a new prompt design for language model, and output re-ranking, the dialogs generated from our model can be directly used to train downstream dialog systems.
arXiv Detail & Related papers (2023-10-16T13:22:34Z) - Adapting Task-Oriented Dialogue Models for Email Conversations [4.45709593827781]
In this paper, we provide an effective transfer learning framework (EMToD) that allows the latest development in dialogue models to be adapted for long-form conversations.
We show that the proposed EMToD framework improves intent detection performance over pre-trained language models by 45% and over pre-trained dialogue models by 30% for task-oriented email conversations.
arXiv Detail & Related papers (2022-08-19T16:41:34Z) - Manual-Guided Dialogue for Flexible Conversational Agents [84.46598430403886]
How to build and use dialogue data efficiently, and how to deploy models in different domains at scale can be critical issues in building a task-oriented dialogue system.
We propose a novel manual-guided dialogue scheme, where the agent learns the tasks from both dialogue and manuals.
Our proposed scheme reduces the dependence of dialogue models on fine-grained domain ontology, and makes them more flexible to adapt to various domains.
arXiv Detail & Related papers (2022-08-16T08:21:12Z) - What Helps Transformers Recognize Conversational Structure? Importance
of Context, Punctuation, and Labels in Dialog Act Recognition [41.1669799542627]
We apply two pre-trained transformer models to structure a conversational transcript as a sequence of dialog acts.
We find that the inclusion of a broader conversational context helps disambiguate many dialog act classes.
A detailed analysis reveals specific segmentation patterns observed in its absence.
arXiv Detail & Related papers (2021-07-05T21:56:00Z) - Dialogue History Matters! Personalized Response Selectionin Multi-turn
Retrieval-based Chatbots [62.295373408415365]
We propose a personalized hybrid matching network (PHMN) for context-response matching.
Our contributions are two-fold: 1) our model extracts personalized wording behaviors from user-specific dialogue history as extra matching information.
We evaluate our model on two large datasets with user identification, i.e., personalized dialogue Corpus Ubuntu (P- Ubuntu) and personalized Weibo dataset (P-Weibo)
arXiv Detail & Related papers (2021-03-17T09:42:11Z) - Dialogue-Based Relation Extraction [53.2896545819799]
We present the first human-annotated dialogue-based relation extraction (RE) dataset DialogRE.
We argue that speaker-related information plays a critical role in the proposed task, based on an analysis of similarities and differences between dialogue-based and traditional RE tasks.
Experimental results demonstrate that a speaker-aware extension on the best-performing model leads to gains in both the standard and conversational evaluation settings.
arXiv Detail & Related papers (2020-04-17T03:51:57Z) - Conversation Learner -- A Machine Teaching Tool for Building Dialog
Managers for Task-Oriented Dialog Systems [57.082447660944965]
Conversation Learner is a machine teaching tool for building dialog managers.
It enables dialog authors to create a dialog flow using familiar tools, converting the dialog flow into a parametric model.
It allows dialog authors to improve the dialog manager over time by leveraging user-system dialog logs as training data.
arXiv Detail & Related papers (2020-04-09T00:10:54Z) - Interview: A Large-Scale Open-Source Corpus of Media Dialog [11.28504775964698]
We introduce 'Interview': a large-scale (105K conversations) media dialog dataset collected from news interview transcripts.
Compared to existing large-scale proxies for conversational data, language models trained on our dataset exhibit better zero-shot out-of-domain performance.
'Interview' contains speaker role annotations for each turn, facilitating the development of engaging, responsive dialog systems.
arXiv Detail & Related papers (2020-04-07T02:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.