Hybrid phonetic-neural model for correction in speech recognition
systems
- URL: http://arxiv.org/abs/2102.06744v1
- Date: Fri, 12 Feb 2021 19:57:16 GMT
- Title: Hybrid phonetic-neural model for correction in speech recognition
systems
- Authors: Rafael Viana-C\'amara, Mario Campos-Soberanis, Diego Campos-Sobrino
- Abstract summary: We explore using a deep neural network to refine the results of a phonetic correction algorithm applied to a telesales audio database.
The results show the viability of deep learning models together with post-processing correction strategies to reduce errors made by closed ASRs in specific language domains.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic speech recognition (ASR) is a relevant area in multiple settings
because it provides a natural communication mechanism between applications and
users. ASRs often fail in environments that use language specific to particular
application domains. Some strategies have been explored to reduce errors in
closed ASRs through post-processing, particularly automatic spell checking, and
deep learning approaches. In this article, we explore using a deep neural
network to refine the results of a phonetic correction algorithm applied to a
telesales audio database. The results exhibit a reduction in the word error
rate (WER), both in the original transcription and in the phonetic correction,
which shows the viability of deep learning models together with post-processing
correction strategies to reduce errors made by closed ASRs in specific language
domains.
Related papers
- Towards interfacing large language models with ASR systems using confidence measures and prompting [54.39667883394458]
This work investigates post-hoc correction of ASR transcripts with large language models (LLMs)
To avoid introducing errors into likely accurate transcripts, we propose a range of confidence-based filtering methods.
Our results indicate that this can improve the performance of less competitive ASR systems.
arXiv Detail & Related papers (2024-07-31T08:00:41Z) - Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition [52.624909026294105]
We propose a non-autoregressive speech error correction method.
A Confidence Module measures the uncertainty of each word of the N-best ASR hypotheses.
The proposed system reduces the error rate by 21% compared with the ASR model.
arXiv Detail & Related papers (2024-06-29T17:56:28Z) - Generative error correction for code-switching speech recognition using
large language models [49.06203730433107]
Code-switching (CS) speech refers to the phenomenon of mixing two or more languages within the same sentence.
We propose to leverage large language models (LLMs) and lists of hypotheses generated by an ASR to address the CS problem.
arXiv Detail & Related papers (2023-10-17T14:49:48Z) - Improved Contextual Recognition In Automatic Speech Recognition Systems
By Semantic Lattice Rescoring [4.819085609772069]
We propose a novel approach for enhancing contextual recognition within ASR systems via semantic lattice processing.
Our solution consists of using Hidden Markov Models and Gaussian Mixture Models (HMM-GMM) along with Deep Neural Networks (DNN) models for better accuracy.
We demonstrate the effectiveness of our proposed framework on the LibriSpeech dataset with empirical analyses.
arXiv Detail & Related papers (2023-10-14T23:16:05Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Boosting Chinese ASR Error Correction with Dynamic Error Scaling
Mechanism [27.09416337926635]
Current mainstream models often struggle with effectively utilizing word-level features and phonetic information.
This paper introduces a novel approach that incorporates a dynamic error scaling mechanism to detect and correct phonetically erroneous text.
arXiv Detail & Related papers (2023-08-07T09:19:59Z) - Unsupervised domain adaptation for speech recognition with unsupervised
error correction [20.465220855548292]
We propose an unsupervised error correction method for unsupervised ASR domain adaption.
Our approach requires only unlabeled data of the target domains in which a pseudo-labeling technique is applied to generate correction training samples.
Experiment results show that our method obtains a significant word error rate (WER) reduction over non-adapted ASR systems.
arXiv Detail & Related papers (2022-09-24T16:05:23Z) - Neural Model Reprogramming with Similarity Based Mapping for
Low-Resource Spoken Command Recognition [71.96870151495536]
We propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR)
The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model.
We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech.
arXiv Detail & Related papers (2021-10-08T05:07:35Z) - Improving Distinction between ASR Errors and Speech Disfluencies with
Feature Space Interpolation [0.0]
Fine-tuning pretrained language models (LMs) is a popular approach to automatic speech recognition (ASR) error detection during post-processing.
This paper proposes a scheme to improve existing LM-based ASR error detection systems.
arXiv Detail & Related papers (2021-08-04T02:11:37Z) - Evolutionary optimization of contexts for phonetic correction in speech
recognition systems [0.0]
It is common for general purpose ASR systems to fail in applications that use a domain-specific language.
Various strategies have been used to reduce the error, such as providing a context that modifies the language model.
This article explores the use of an evolutionary process to generate an optimized context for a specific application domain.
arXiv Detail & Related papers (2021-02-23T04:14:51Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.