Related papers: Self-consistent context aware conformer transducer for speech recognition

Self-consistent context aware conformer transducer for speech recognition

URL: http://arxiv.org/abs/2402.06592v1
Date: Fri, 9 Feb 2024 18:12:11 GMT
Title: Self-consistent context aware conformer transducer for speech recognition
Authors: Konstantin Kolokolov, Pavel Pekichev, Karthik Raghunathan
Abstract summary: We propose a novel neural network architecture based on conformer transducer that adds contextual information flow to the ASR systems. Our method improves the accuracy of recognizing uncommon words while not harming the word error rate of regular words.
Score: 0.0675253870287079
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel neural network architecture based on conformer transducer that adds contextual information flow to the ASR systems. Our method improves the accuracy of recognizing uncommon words while not harming the word error rate of regular words. We explore the uncommon words accuracy improvement when we use the new model and/or shallow fusion with context language model. We found that combination of both provides cumulative gain in uncommon words recognition accuracy.

Related papers

Continuously Learning New Words in Automatic Speech Recognition [56.972851337263755]
We propose an self-supervised continual learning approach to recognize new words. We use a memory-enhanced Automatic Speech Recognition model from previous work. We show that with this approach, we obtain increasing performance on the new words when they occur more frequently.
arXiv Detail & Related papers (2024-01-09T10:39:17Z)
Improved Contextual Recognition In Automatic Speech Recognition Systems By Semantic Lattice Rescoring [4.819085609772069]
We propose a novel approach for enhancing contextual recognition within ASR systems via semantic lattice processing. Our solution consists of using Hidden Markov Models and Gaussian Mixture Models (HMM-GMM) along with Deep Neural Networks (DNN) models for better accuracy. We demonstrate the effectiveness of our proposed framework on the LibriSpeech dataset with empirical analyses.
arXiv Detail & Related papers (2023-10-14T23:16:05Z)
Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text. As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z)
The neural dynamics of auditory word recognition and integration [21.582292050622456]
We present a computational model of word recognition which formalizes this perceptual process in Bayesian decision theory. We fit this model to explain scalp EEG signals recorded as subjects passively listened to a fictional story. The model reveals distinct neural processing of words depending on whether or not they can be quickly recognized.
arXiv Detail & Related papers (2023-05-22T18:06:32Z)
Surrogate Gradient Spiking Neural Networks as Encoders for Large Vocabulary Continuous Speech Recognition [91.39701446828144]
We show that spiking neural networks can be trained like standard recurrent neural networks using the surrogate gradient method. They have shown promising results on speech command recognition tasks. In contrast to their recurrent non-spiking counterparts, they show robustness to exploding gradient problems without the need to use gates.
arXiv Detail & Related papers (2022-12-01T12:36:26Z)
Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels. Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z)
Word-level confidence estimation for RNN transducers [7.12355127219356]
We present a lightweight neural confidence model tailored for Automatic Speech Recognition (ASR) system with Recurrent Network Transducers (RNN-T) Compared to other existing approaches, our model utilizes: (a) the time information associated with recognized words, which reduces the computational complexity, and (b) a simple and elegant trick for mapping between sub-word and word sequences.
arXiv Detail & Related papers (2021-09-28T18:38:00Z)
Position-Invariant Truecasing with a Word-and-Character Hierarchical Recurrent Neural Network [10.425277173548212]
We propose a fast, accurate and compact two-level hierarchical word-and-character-based recurrent neural network model. We also address the problem of truecasing while ignoring token positions in the sentence.
arXiv Detail & Related papers (2021-08-26T17:54:35Z)
A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings [50.524054820564395]
We propose a new unsupervised model for mapping a variable-duration speech segment to a fixed-dimensional representation. The resulting acoustic word embeddings can form the basis of search, discovery, and indexing systems for low- and zero-resource languages.
arXiv Detail & Related papers (2020-12-03T19:24:42Z)
Mechanisms for Handling Nested Dependencies in Neural-Network Language Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing. Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement. We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.