Evolutionary optimization of contexts for phonetic correction in speech
recognition systems
- URL: http://arxiv.org/abs/2102.11480v1
- Date: Tue, 23 Feb 2021 04:14:51 GMT
- Title: Evolutionary optimization of contexts for phonetic correction in speech
recognition systems
- Authors: Rafael Viana-C\'amara, Diego Campos-Sobrino, Mario Campos-Soberanis
- Abstract summary: It is common for general purpose ASR systems to fail in applications that use a domain-specific language.
Various strategies have been used to reduce the error, such as providing a context that modifies the language model.
This article explores the use of an evolutionary process to generate an optimized context for a specific application domain.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic Speech Recognition (ASR) is an area of growing academic and
commercial interest due to the high demand for applications that use it to
provide a natural communication method. It is common for general purpose ASR
systems to fail in applications that use a domain-specific language. Various
strategies have been used to reduce the error, such as providing a context that
modifies the language model and post-processing correction methods. This
article explores the use of an evolutionary process to generate an optimized
context for a specific application domain, as well as different correction
techniques based on phonetic distance metrics. The results show the viability
of a genetic algorithm as a tool for context optimization, which, added to a
post-processing correction based on phonetic representations, can reduce the
errors on the recognized speech.
Related papers
- Boosting CNN-based Handwriting Recognition Systems with Learnable Relaxation Labeling [48.78361527873024]
We propose a novel approach to handwriting recognition that integrates the strengths of two distinct methodologies.
We introduce a sparsification technique that accelerates the convergence of the algorithm and enhances the overall system's performance.
arXiv Detail & Related papers (2024-09-09T15:12:28Z) - Towards interfacing large language models with ASR systems using confidence measures and prompting [54.39667883394458]
This work investigates post-hoc correction of ASR transcripts with large language models (LLMs)
To avoid introducing errors into likely accurate transcripts, we propose a range of confidence-based filtering methods.
Our results indicate that this can improve the performance of less competitive ASR systems.
arXiv Detail & Related papers (2024-07-31T08:00:41Z) - Tag and correct: high precision post-editing approach to correction of speech recognition errors [0.0]
It consists of using a neural sequence tagger that learns how to correct an ASR (Automatic Speech Recognition) hypothesis word by word and a corrector module that applies corrections returned by the tagger.
The proposed solution is applicable to any ASR system, regardless of its architecture, and provides high-precision control over errors being corrected.
arXiv Detail & Related papers (2024-06-11T09:52:33Z) - Towards Contextual Spelling Correction for Customization of End-to-end
Speech Recognition Systems [27.483603895258437]
We introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system.
We propose filtering algorithms to handle large-size context lists, and performance balancing mechanisms to control the biasing degree of the model.
Experiments show that the proposed method achieves as much as 51% relative word error rate (WER) reduction over ASR system and outperforms traditional biasing methods.
arXiv Detail & Related papers (2022-03-02T06:00:48Z) - Neural Model Reprogramming with Similarity Based Mapping for
Low-Resource Spoken Command Recognition [71.96870151495536]
We propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR)
The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model.
We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech.
arXiv Detail & Related papers (2021-10-08T05:07:35Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z) - A Light-weight contextual spelling correction model for customizing
transducer-based speech recognition systems [42.05399301143457]
We introduce a light-weight contextual spelling correction model to correct context-related recognition errors.
Experiments show that the model improves baseline ASR model performance with about 50% relative word error rate reduction.
The model also shows excellent performance for out-of-vocabulary terms not seen during training.
arXiv Detail & Related papers (2021-08-17T08:14:37Z) - Seed Words Based Data Selection for Language Model Adaptation [11.59717828860318]
We present an approach for automatically selecting sentences, from a text corpus, that match, both semantically and morphologically, a glossary of terms furnished by the user.
The vocabulary of the baseline model is expanded and tailored, reducing the resulting OOV rate.
Results using different metrics (OOV rate, WER, precision and recall) show the effectiveness of the proposed techniques.
arXiv Detail & Related papers (2021-07-20T12:08:27Z) - Time-domain Speech Enhancement with Generative Adversarial Learning [53.74228907273269]
This paper proposes a new framework called Time-domain Speech Enhancement Generative Adversarial Network (TSEGAN)
TSEGAN is an extension of the generative adversarial network (GAN) in time-domain with metric evaluation to mitigate the scaling problem.
In addition, we provide a new method based on objective function mapping for the theoretical analysis of the performance of Metric GAN.
arXiv Detail & Related papers (2021-03-30T08:09:49Z) - Hybrid phonetic-neural model for correction in speech recognition
systems [0.0]
We explore using a deep neural network to refine the results of a phonetic correction algorithm applied to a telesales audio database.
The results show the viability of deep learning models together with post-processing correction strategies to reduce errors made by closed ASRs in specific language domains.
arXiv Detail & Related papers (2021-02-12T19:57:16Z) - Gated Recurrent Fusion with Joint Training Framework for Robust
End-to-End Speech Recognition [64.9317368575585]
This paper proposes a gated recurrent fusion (GRF) method with joint training framework for robust end-to-end ASR.
The GRF algorithm is used to dynamically combine the noisy and enhanced features.
The proposed method achieves the relative character error rate (CER) reduction of 10.04% over the conventional joint enhancement and transformer method.
arXiv Detail & Related papers (2020-11-09T08:52:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.