Neural-FST Class Language Model for End-to-End Speech Recognition
- URL: http://arxiv.org/abs/2201.11867v2
- Date: Mon, 31 Jan 2022 18:05:13 GMT
- Title: Neural-FST Class Language Model for End-to-End Speech Recognition
- Authors: Antoine Bruguier, Duc Le, Rohit Prabhavalkar, Dangna Li, Zhe Liu, Bo
Wang, Eun Chang, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer
- Abstract summary: We propose a Neural-FST Class Language Model (NFCLM) for end-to-end speech recognition.
We show that NFCLM significantly outperforms NNLM by 15.8% relative in terms of Word Error Rate.
- Score: 30.670375747577694
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose Neural-FST Class Language Model (NFCLM) for end-to-end speech
recognition, a novel method that combines neural network language models
(NNLMs) and finite state transducers (FSTs) in a mathematically consistent
framework. Our method utilizes a background NNLM which models generic
background text together with a collection of domain-specific entities modeled
as individual FSTs. Each output token is generated by a mixture of these
components; the mixture weights are estimated with a separately trained neural
decider. We show that NFCLM significantly outperforms NNLM by 15.8% relative in
terms of Word Error Rate. NFCLM achieves similar performance as traditional
NNLM and FST shallow fusion while being less prone to overbiasing and 12 times
more compact, making it more suitable for on-device usage.
Related papers
- Lattice Rescoring Based on Large Ensemble of Complementary Neural
Language Models [50.164379437671904]
We investigate the effectiveness of using a large ensemble of advanced neural language models (NLMs) for lattice rescoring on automatic speech recognition hypotheses.
In experiments using a lecture speech corpus, by combining the eight NLMs and using context carry-over, we obtained a 24.4% relative word error rate reduction from the ASR 1-best baseline.
arXiv Detail & Related papers (2023-12-20T04:52:24Z) - Generative Spoken Language Model based on continuous word-sized audio
tokens [52.081868603603844]
We introduce a Generative Spoken Language Model based on word-size continuous-valued audio embeddings.
The resulting model is the first generative language model based on word-size continuous embeddings.
arXiv Detail & Related papers (2023-10-08T16:46:14Z) - External Language Model Integration for Factorized Neural Transducers [7.5969913968845155]
We propose an adaptation method for factorized neural transducers (FNT) with external language models.
We show average gains of 18% WERR with lexical adaptation across various scenarios and additive gains of up to 60% WERR in one entity-rich scenario.
arXiv Detail & Related papers (2023-05-26T23:30:21Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - Shallow Fusion of Weighted Finite-State Transducer and Language Model
for Text Normalization [13.929356163132558]
We propose a new hybrid approach that combines the benefits of rule-based and neural systems.
First, a non-deterministic WFST outputs all normalization candidates, and then a neural language model picks the best one.
It achieves comparable or better results than existing state-of-the-art TN models.
arXiv Detail & Related papers (2022-03-29T21:34:35Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - "What's The Context?" : Long Context NLM Adaptation for ASR Rescoring in
Conversational Agents [13.586996848831543]
We investigate various techniques to incorporate turn based context history into both recurrent (LSTM) and Transformer-XL based NLMs.
For recurrent based NLMs, we explore context carry over mechanism and feature based augmentation.
We adapt our contextual NLM towards user provided on-the-fly speech patterns by leveraging encodings from a large pre-trained masked language model.
arXiv Detail & Related papers (2021-04-21T00:15:21Z) - NSL: Hybrid Interpretable Learning From Noisy Raw Data [66.15862011405882]
This paper introduces a hybrid neural-symbolic learning framework, called NSL, that learns interpretable rules from labelled unstructured data.
NSL combines pre-trained neural networks for feature extraction with FastLAS, a state-of-the-art ILP system for rule learning under the answer set semantics.
We demonstrate that NSL is able to learn robust rules from MNIST data and achieve comparable or superior accuracy when compared to neural network and random forest baselines.
arXiv Detail & Related papers (2020-12-09T13:02:44Z) - Federated Marginal Personalization for ASR Rescoring [13.086007347727206]
Federated marginal personalization (FMP) is a novel method for continuously updating personalized neural network language models (NNLMs) on private devices using federated learning (FL)
FMP regularly estimates global and personalized marginal distributions of words, and adjusts the probabilities from NNLMs by an adaptation factor that is specific to each word.
Experiments on two speech evaluation datasets show modest word error rate (WER) reductions.
arXiv Detail & Related papers (2020-12-01T23:54:41Z) - Improved Neural Language Model Fusion for Streaming Recurrent Neural
Network Transducer [28.697119605752643]
Recurrent Neural Network Transducer (RNN-T) has an implicit neural network language model (NNLM) and cannot easily leverage unpaired text data during training.
Previous work has proposed various fusion methods to incorporate external NNLMs into end-to-end ASR to address this weakness.
We propose extensions to these techniques that allow RNN-T to exploit external NNLMs during both training and inference time.
arXiv Detail & Related papers (2020-10-26T20:10:12Z) - Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by
Spiking Neural Network [68.43026108936029]
We propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment.
We implement this algorithm in a real-time robotic system with a microphone array.
The experiment results show a mean error azimuth of 13 degrees, which surpasses the accuracy of the other biologically plausible neuromorphic approach for sound source localization.
arXiv Detail & Related papers (2020-07-07T08:22:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.