User-Initiated Repetition-Based Recovery in Multi-Utterance Dialogue
Systems
- URL: http://arxiv.org/abs/2108.01208v1
- Date: Mon, 2 Aug 2021 23:32:13 GMT
- Title: User-Initiated Repetition-Based Recovery in Multi-Utterance Dialogue
Systems
- Authors: Hoang Long Nguyen, Vincent Renkens, Joris Pelemans, Srividya Pranavi
Potharaju, Anil Kumar Nalamalapu, Murat Akbacak
- Abstract summary: We present a system that allows a user to correct speech recognition errors in a virtual assistant by repeating misunderstood words.
When a user repeats part of the phrase the system rewrites the original query to incorporate the correction.
We show that rewriting the original query is an effective way to handle repetition-based recovery.
- Score: 3.20350998499235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recognition errors are common in human communication. Similar errors often
lead to unwanted behaviour in dialogue systems or virtual assistants. In human
communication, we can recover from them by repeating misrecognized words or
phrases; however in human-machine communication this recovery mechanism is not
available. In this paper, we attempt to bridge this gap and present a system
that allows a user to correct speech recognition errors in a virtual assistant
by repeating misunderstood words. When a user repeats part of the phrase the
system rewrites the original query to incorporate the correction. This rewrite
allows the virtual assistant to understand the original query successfully. We
present an end-to-end 2-step attention pointer network that can generate the
the rewritten query by merging together the incorrectly understood utterance
with the correction follow-up. We evaluate the model on data collected for this
task and compare the proposed model to a rule-based baseline and a standard
pointer network. We show that rewriting the original query is an effective way
to handle repetition-based recovery and that the proposed model outperforms the
rule based baseline, reducing Word Error Rate by 19% relative at 2% False Alarm
Rate on annotated data.
Related papers
- Speaker Tagging Correction With Non-Autoregressive Language Models [0.0]
We propose a speaker tagging correction system based on a non-autoregressive language model.
We show that the employed error correction approach leads to reductions in word diarization error rate (WDER) on two datasets.
arXiv Detail & Related papers (2024-08-30T11:02:17Z) - Improving Speech Recognition Error Prediction for Modern and Off-the-shelf Speech Recognizers [15.74988399856102]
We extend a prior phonetic confusion based model for predicting speech recognition errors in two ways.
We introduce a sampling-based paradigm that better simulates the behavior of a posterior-based acoustic model.
We evaluate the error predictors in two ways: first by predicting the errors made by a Switchboard ASR system on unseen data, and then using that same predictor to estimate the behavior of an unrelated cloud-based ASR system.
arXiv Detail & Related papers (2024-08-21T00:48:03Z) - Adapting Dual-encoder Vision-language Models for Paraphrased Retrieval [55.90407811819347]
We consider the task of paraphrased text-to-image retrieval where a model aims to return similar results given a pair of paraphrased queries.
We train a dual-encoder model starting from a language model pretrained on a large text corpus.
Compared to public dual-encoder models such as CLIP and OpenCLIP, the model trained with our best adaptation strategy achieves a significantly higher ranking similarity for paraphrased queries.
arXiv Detail & Related papers (2024-05-06T06:30:17Z) - Self-consistent context aware conformer transducer for speech recognition [0.06008132390640294]
We introduce a novel neural network module that adeptly handles recursive data flow in neural network architectures.
Our method notably improves the accuracy of recognizing rare words without adversely affecting the word error rate for common vocabulary.
Our findings reveal that the combination of both approaches can improve the accuracy of detecting rare words by as much as 4.5 times.
arXiv Detail & Related papers (2024-02-09T18:12:11Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - SpellMapper: A non-autoregressive neural spellchecker for ASR
customization with candidate retrieval based on n-gram mappings [76.87664008338317]
Contextual spelling correction models are an alternative to shallow fusion to improve automatic speech recognition.
We propose a novel algorithm for candidate retrieval based on misspelled n-gram mappings.
Experiments on Spoken Wikipedia show 21.4% word error rate improvement compared to a baseline ASR system.
arXiv Detail & Related papers (2023-06-04T10:00:12Z) - End-to-End Page-Level Assessment of Handwritten Text Recognition [69.55992406968495]
HTR systems increasingly face the end-to-end page-level transcription of a document.
Standard metrics do not take into account the inconsistencies that might appear.
We propose a two-fold evaluation, where the transcription accuracy and the RO goodness are considered separately.
arXiv Detail & Related papers (2023-01-14T15:43:07Z) - Factual Error Correction for Abstractive Summaries Using Entity
Retrieval [57.01193722520597]
We propose an efficient factual error correction system RFEC based on entities retrieval post-editing process.
RFEC retrieves the evidence sentences from the original document by comparing the sentences with the target summary.
Next, RFEC detects the entity-level errors in the summaries by considering the evidence sentences and substitutes the wrong entities with the accurate entities from the evidence sentences.
arXiv Detail & Related papers (2022-04-18T11:35:02Z) - Personalized Query Rewriting in Conversational AI Agents [7.086654234990377]
We propose a query rewriting approach by leveraging users' historically successful interactions as a form of memory.
We present a neural retrieval model and a pointer-generator network with hierarchical attention and show that they perform significantly better at the query rewriting task with the aforementioned user memories than without.
arXiv Detail & Related papers (2020-11-09T20:45:39Z) - Wake Word Detection with Alignment-Free Lattice-Free MMI [66.12175350462263]
Always-on spoken language interfaces, e.g. personal digital assistants, rely on a wake word to start processing spoken input.
We present novel methods to train a hybrid DNN/HMM wake word detection system from partially labeled training data.
We evaluate our methods on two real data sets, showing 50%--90% reduction in false rejection rates at pre-specified false alarm rates over the best previously published figures.
arXiv Detail & Related papers (2020-05-17T19:22:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.