Self-consistent context aware conformer transducer for speech
recognition
- URL: http://arxiv.org/abs/2402.06592v1
- Date: Fri, 9 Feb 2024 18:12:11 GMT
- Title: Self-consistent context aware conformer transducer for speech
recognition
- Authors: Konstantin Kolokolov, Pavel Pekichev, Karthik Raghunathan
- Abstract summary: We propose a novel neural network architecture based on conformer transducer that adds contextual information flow to the ASR systems.
Our method improves the accuracy of recognizing uncommon words while not harming the word error rate of regular words.
- Score: 0.0675253870287079
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel neural network architecture based on conformer transducer
that adds contextual information flow to the ASR systems. Our method improves
the accuracy of recognizing uncommon words while not harming the word error
rate of regular words. We explore the uncommon words accuracy improvement when
we use the new model and/or shallow fusion with context language model. We
found that combination of both provides cumulative gain in uncommon words
recognition accuracy.
Related papers
- Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation [27.057810339120664]
We propose two techniques to improve context-aware ASR models.
On LibriSpeech, our techniques together reduce the rare word error rate by 60% and 25% relatively compared to no biasing and shallow fusion.
On SPGISpeech and a real-world dataset ConEC, our techniques also yield good improvements over the baselines.
arXiv Detail & Related papers (2024-07-14T19:32:33Z) - Continuously Learning New Words in Automatic Speech Recognition [56.972851337263755]
We propose an self-supervised continual learning approach to recognize new words.
We use a memory-enhanced Automatic Speech Recognition model from previous work.
We show that with this approach, we obtain increasing performance on the new words when they occur more frequently.
arXiv Detail & Related papers (2024-01-09T10:39:17Z) - Improved Contextual Recognition In Automatic Speech Recognition Systems
By Semantic Lattice Rescoring [4.819085609772069]
We propose a novel approach for enhancing contextual recognition within ASR systems via semantic lattice processing.
Our solution consists of using Hidden Markov Models and Gaussian Mixture Models (HMM-GMM) along with Deep Neural Networks (DNN) models for better accuracy.
We demonstrate the effectiveness of our proposed framework on the LibriSpeech dataset with empirical analyses.
arXiv Detail & Related papers (2023-10-14T23:16:05Z) - ed-cec: improving rare word recognition using asr postprocessing based
on error detection and context-aware error correction [30.486396813844195]
We present a novel ASR postprocessing method that focuses on improving the recognition of rare words through error detection and context-aware error correction.
Experimental results across five datasets demonstrate that our proposed method achieves significantly lower word error rates (WERs) than previous approaches.
arXiv Detail & Related papers (2023-10-08T11:40:30Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Robust Acoustic and Semantic Contextual Biasing in Neural Transducers
for Speech Recognition [14.744220870243932]
We propose to use lightweight character representations to encode fine-grained pronunciation features to improve contextual biasing.
We further integrate pretrained neural language model (NLM) based encoders to encode the utterance's semantic context.
Experiments using a Conformer Transducer model on the Librispeech dataset show a 4.62% - 9.26% relative WER improvement on different biasing list sizes.
arXiv Detail & Related papers (2023-05-09T08:51:44Z) - Short-Term Word-Learning in a Dynamically Changing Environment [63.025297637716534]
We show how to supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.
We demonstrate significant improvements in the detection rate of new words with only a minor increase in false alarms.
arXiv Detail & Related papers (2022-03-29T10:05:39Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z) - Instant One-Shot Word-Learning for Context-Specific Neural
Sequence-to-Sequence Speech Recognition [62.997667081978825]
We present an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.
In this paper we demonstrate that through this mechanism our system is able to recognize more than 85% of newly added words that it previously failed to recognize.
arXiv Detail & Related papers (2021-07-05T21:08:34Z) - MASKER: Masked Keyword Regularization for Reliable Text Classification [73.90326322794803]
We propose a fine-tuning method, coined masked keyword regularization (MASKER), that facilitates context-based prediction.
MASKER regularizes the model to reconstruct the keywords from the rest of the words and make low-confidence predictions without enough context.
We demonstrate that MASKER improves OOD detection and cross-domain generalization without degrading classification accuracy.
arXiv Detail & Related papers (2020-12-17T04:54:16Z) - Fast and Robust Unsupervised Contextual Biasing for Speech Recognition [16.557586847398778]
We propose an alternative approach that does not entail explicit contextual language model.
We derive the bias score for every word in the system vocabulary from the training corpus.
We show significant improvement in recognition accuracy when the relevant context is available.
arXiv Detail & Related papers (2020-05-04T17:29:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.