Lattice Rescoring Based on Large Ensemble of Complementary Neural
Language Models
- URL: http://arxiv.org/abs/2312.12764v1
- Date: Wed, 20 Dec 2023 04:52:24 GMT
- Title: Lattice Rescoring Based on Large Ensemble of Complementary Neural
Language Models
- Authors: Atsunori Ogawa, Naohiro Tawara, Marc Delcroix, Shoko Araki
- Abstract summary: We investigate the effectiveness of using a large ensemble of advanced neural language models (NLMs) for lattice rescoring on automatic speech recognition hypotheses.
In experiments using a lecture speech corpus, by combining the eight NLMs and using context carry-over, we obtained a 24.4% relative word error rate reduction from the ASR 1-best baseline.
- Score: 50.164379437671904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the effectiveness of using a large ensemble of advanced neural
language models (NLMs) for lattice rescoring on automatic speech recognition
(ASR) hypotheses. Previous studies have reported the effectiveness of combining
a small number of NLMs. In contrast, in this study, we combine up to eight
NLMs, i.e., forward/backward long short-term memory/Transformer-LMs that are
trained with two different random initialization seeds. We combine these NLMs
through iterative lattice generation. Since these NLMs work complementarily
with each other, by combining them one by one at each rescoring iteration,
language scores attached to given lattice arcs can be gradually refined.
Consequently, errors of the ASR hypotheses can be gradually reduced. We also
investigate the effectiveness of carrying over contextual information (previous
rescoring results) across a lattice sequence of a long speech such as a lecture
speech. In experiments using a lecture speech corpus, by combining the eight
NLMs and using context carry-over, we obtained a 24.4% relative word error rate
reduction from the ASR 1-best baseline. For further comparison, we performed
simultaneous (i.e., non-iterative) NLM combination and 100-best rescoring using
the large ensemble of NLMs, which confirmed the advantage of lattice rescoring
with iterative NLM combination.
Related papers
- Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions [28.211967723403987]
We find that optimizing speech prefixes leads to better ASR performance and propose applying RNNT loss to perform speech prefix-tuning.
Our recognition results on an average of 10 Indics show that the proposed prefix-tuning with RNNT loss results in a 12% relative improvement in WER over the baseline with a fine-tuned LLM.
arXiv Detail & Related papers (2024-06-20T19:50:49Z) - Nearest Neighbor Speculative Decoding for LLM Generation and Attribution [87.3259169631789]
Nearest Speculative Decoding (NEST) is capable of incorporating real-world text spans of arbitrary length into the LM generations and providing attribution to their sources.
NEST significantly enhances the generation quality and attribution rate of the base LM across a variety of knowledge-intensive tasks.
In addition, NEST substantially improves the generation speed, achieving a 1.8x speedup in inference time when applied to Llama-2-Chat 70B.
arXiv Detail & Related papers (2024-05-29T17:55:03Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers [52.88268942796418]
Internal language model (ILM) subtraction has been widely applied to improve the performance of the RNN-Transducer.
We show that sequence discriminative training has a strong correlation with ILM subtraction from both theoretical and empirical points of view.
arXiv Detail & Related papers (2023-09-25T13:35:28Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - Neural-FST Class Language Model for End-to-End Speech Recognition [30.670375747577694]
We propose a Neural-FST Class Language Model (NFCLM) for end-to-end speech recognition.
We show that NFCLM significantly outperforms NNLM by 15.8% relative in terms of Word Error Rate.
arXiv Detail & Related papers (2022-01-28T00:20:57Z) - "What's The Context?" : Long Context NLM Adaptation for ASR Rescoring in
Conversational Agents [13.586996848831543]
We investigate various techniques to incorporate turn based context history into both recurrent (LSTM) and Transformer-XL based NLMs.
For recurrent based NLMs, we explore context carry over mechanism and feature based augmentation.
We adapt our contextual NLM towards user provided on-the-fly speech patterns by leveraging encodings from a large pre-trained masked language model.
arXiv Detail & Related papers (2021-04-21T00:15:21Z) - LT-LM: a novel non-autoregressive language model for single-shot lattice
rescoring [55.16665077221941]
We propose a novel rescoring approach, which processes the entire lattice in a single call to the model.
The key feature of our rescoring policy is a novel non-autoregressive Lattice Transformer Language Model (LT-LM)
arXiv Detail & Related papers (2021-04-06T14:06:07Z) - On the Effectiveness of Neural Text Generation based Data Augmentation
for Recognition of Morphologically Rich Speech [0.0]
We have significantly improved the online performance of a conversational speech transcription system by transferring knowledge from a RNNLM to the single pass BNLM with text generation based data augmentation.
We show that using the RNN-BNLM in the first pass followed by a neural second pass, offline ASR results can be even significantly improved.
arXiv Detail & Related papers (2020-06-09T09:01:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.