Keyword-Aware ASR Error Augmentation for Robust Dialogue State Tracking
- URL: http://arxiv.org/abs/2409.06263v1
- Date: Tue, 10 Sep 2024 07:06:40 GMT
- Title: Keyword-Aware ASR Error Augmentation for Robust Dialogue State Tracking
- Authors: Jihyun Lee, Solee Im, Wonjun Lee, Gary Geunbae Lee,
- Abstract summary: We introduce a simple yet effective data augmentation method to improve robustness of Dialogue State Tracking model.
Our method generates sufficient error patterns on keywords, leading to improved accuracy in noised and low-accuracy ASR environments.
- Score: 17.96115263146684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dialogue State Tracking (DST) is a key part of task-oriented dialogue systems, identifying important information in conversations. However, its accuracy drops significantly in spoken dialogue environments due to named entity errors from Automatic Speech Recognition (ASR) systems. We introduce a simple yet effective data augmentation method that targets those entities to improve the robustness of DST model. Our novel method can control the placement of errors using keyword-highlighted prompts while introducing phonetically similar errors. As a result, our method generated sufficient error patterns on keywords, leading to improved accuracy in noised and low-accuracy ASR environments.
Related papers
- Beyond Ontology in Dialogue State Tracking for Goal-Oriented Chatbot [3.2288892242158984]
We propose a novel approach to enhance Dialogue State Tracking (DST) performance.
Our method enables Large Language Model (LLM) to infer dialogue states through carefully designed prompts.
Our approach achieved state-of-the-art with a JGA of 42.57%, and performed well in open-domain real-world conversations.
arXiv Detail & Related papers (2024-10-30T07:36:23Z) - Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation [73.9145653659403]
We show that Generative Error Correction models struggle to generalize beyond the specific types of errors encountered during training.
We propose DARAG, a novel approach designed to improve GEC for ASR in in-domain (ID) and OOD scenarios.
Our approach is simple, scalable, and both domain- and language-agnostic.
arXiv Detail & Related papers (2024-10-17T04:00:29Z) - Towards interfacing large language models with ASR systems using confidence measures and prompting [54.39667883394458]
This work investigates post-hoc correction of ASR transcripts with large language models (LLMs)
To avoid introducing errors into likely accurate transcripts, we propose a range of confidence-based filtering methods.
Our results indicate that this can improve the performance of less competitive ASR systems.
arXiv Detail & Related papers (2024-07-31T08:00:41Z) - Toward Practical Automatic Speech Recognition and Post-Processing: a
Call for Explainable Error Benchmark Guideline [12.197453599489963]
We propose the development of an Error Explainable Benchmark (EEB) dataset.
This dataset, while considering both speech- and text-level, enables a granular understanding of the model's shortcomings.
Our proposition provides a structured pathway for a more real-world-centric' evaluation, allowing for the detection and rectification of nuanced system weaknesses.
arXiv Detail & Related papers (2024-01-26T03:42:45Z) - ed-cec: improving rare word recognition using asr postprocessing based
on error detection and context-aware error correction [30.486396813844195]
We present a novel ASR postprocessing method that focuses on improving the recognition of rare words through error detection and context-aware error correction.
Experimental results across five datasets demonstrate that our proposed method achieves significantly lower word error rates (WERs) than previous approaches.
arXiv Detail & Related papers (2023-10-08T11:40:30Z) - Boosting Chinese ASR Error Correction with Dynamic Error Scaling
Mechanism [27.09416337926635]
Current mainstream models often struggle with effectively utilizing word-level features and phonetic information.
This paper introduces a novel approach that incorporates a dynamic error scaling mechanism to detect and correct phonetically erroneous text.
arXiv Detail & Related papers (2023-08-07T09:19:59Z) - Prompt Learning for Few-Shot Dialogue State Tracking [75.50701890035154]
This paper focuses on how to learn a dialogue state tracking (DST) model efficiently with limited labeled data.
We design a prompt learning framework for few-shot DST, which consists of two main components: value-based prompt and inverse prompt mechanism.
Experiments show that our model can generate unseen slots and outperforms existing state-of-the-art few-shot methods.
arXiv Detail & Related papers (2022-01-15T07:37:33Z) - Data Augmentation for Training Dialog Models Robust to Speech
Recognition Errors [5.53506103787497]
Speech-based virtual assistants, such as Amazon Alexa, Google assistant, and Apple Siri, typically convert users' audio signals to text data through automatic speech recognition (ASR)
The ASR output is error-prone; however, the downstream dialog models are often trained on error-free text data, making them sensitive to ASR errors during inference time.
We leverage an ASR error simulator to inject noise into the error-free text data, and subsequently train the dialog models with the augmented data.
arXiv Detail & Related papers (2020-06-10T03:18:15Z) - A Simple Language Model for Task-Oriented Dialogue [61.84084939472287]
SimpleTOD is a simple approach to task-oriented dialogue that uses a single, causal language model trained on all sub-tasks recast as a single sequence prediction problem.
This allows SimpleTOD to fully leverage transfer learning from pre-trained, open domain, causal language models such as GPT-2.
arXiv Detail & Related papers (2020-05-02T11:09:27Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.