Contextual Density Ratio for Language Model Biasing of Sequence to
Sequence ASR Systems
- URL: http://arxiv.org/abs/2206.14623v1
- Date: Wed, 29 Jun 2022 13:12:46 GMT
- Title: Contextual Density Ratio for Language Model Biasing of Sequence to
Sequence ASR Systems
- Authors: Jes\'us Andr\'es-Ferrer and Dario Albesano and Puming Zhan and Paul
Vozila
- Abstract summary: We propose a contextual density ratio approach for both training a contextual aware E2E model and adapting the language model to named entities.
Our proposed technique achieves a relative improvement of up to 46.5% on the names over an E2E baseline without degrading the overall recognition accuracy of the whole test set.
- Score: 2.4909170697740963
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End-2-end (E2E) models have become increasingly popular in some ASR tasks
because of their performance and advantages. These E2E models directly
approximate the posterior distribution of tokens given the acoustic inputs.
Consequently, the E2E systems implicitly define a language model (LM) over the
output tokens, which makes the exploitation of independently trained language
models less straightforward than in conventional ASR systems. This makes it
difficult to dynamically adapt E2E ASR system to contextual profiles for better
recognizing special words such as named entities. In this work, we propose a
contextual density ratio approach for both training a contextual aware E2E
model and adapting the language model to named entities. We apply the
aforementioned technique to an E2E ASR system, which transcribes doctor and
patient conversations, for better adapting the E2E system to the names in the
conversations. Our proposed technique achieves a relative improvement of up to
46.5% on the names over an E2E baseline without degrading the overall
recognition accuracy of the whole test set. Moreover, it also surpasses a
contextual shallow fusion baseline by 22.1 % relative.
Related papers
- Sentence-wise Speech Summarization: Task, Datasets, and End-to-End Modeling with LM Knowledge Distillation [44.332577357986324]
Sen-SSum generates text summaries from a spoken document in a sentence-by-sentence manner.
We present two datasets for Sen-SSum: Mega-SSum and CSJ-SSum.
arXiv Detail & Related papers (2024-08-01T00:18:21Z) - Acoustic Model Fusion for End-to-end Speech Recognition [7.431401982826315]
Speech recognition systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM)
We propose the integration of an external AM into the E2E system to better address the domain mismatch.
We have achieved a significant reduction in the word error rate, with an impressive drop of up to 14.3% across varied test sets.
arXiv Detail & Related papers (2023-10-10T23:00:17Z) - End-to-End Speech Recognition: A Survey [68.35707678386949]
The goal of this survey is to provide a taxonomy of E2E ASR models and corresponding improvements.
All relevant aspects of E2E ASR are covered in this work, accompanied by discussions of performance and deployment opportunities.
arXiv Detail & Related papers (2023-03-03T01:46:41Z) - Leveraging Large Text Corpora for End-to-End Speech Summarization [58.673480990374635]
End-to-end speech summarization (E2E SSum) is a technique to directly generate summary sentences from speech.
We present two novel methods that leverage a large amount of external text summarization data for E2E SSum training.
arXiv Detail & Related papers (2023-03-02T05:19:49Z) - Improving Mandarin End-to-End Speech Recognition with Word N-gram
Language Model [57.92200214957124]
External language models (LMs) are used to improve the recognition performance of end-to-end (E2E) automatic speech recognition (ASR) systems.
We propose a novel decoding algorithm where a word-level lattice is constructed on-the-fly to consider all possible word sequences.
Our method consistently outperforms subword-level LMs, including N-gram LM and neural network LM.
arXiv Detail & Related papers (2022-01-06T10:04:56Z) - Consistent Training and Decoding For End-to-end Speech Recognition Using
Lattice-free MMI [67.13999010060057]
We propose a novel approach to integrate LF-MMI criterion into E2E ASR frameworks in both training and decoding stages.
Experiments suggest that the introduction of the LF-MMI criterion consistently leads to significant performance improvements.
arXiv Detail & Related papers (2021-12-05T07:30:17Z) - Have best of both worlds: two-pass hybrid and E2E cascading framework
for speech recognition [71.30167252138048]
Hybrid and end-to-end (E2E) systems have different error patterns in the speech recognition results.
This paper proposes a two-pass hybrid and E2E cascading (HEC) framework to combine the hybrid and E2E model.
We show that the proposed system achieves 8-10% relative word error rate reduction with respect to each individual system.
arXiv Detail & Related papers (2021-10-10T20:11:38Z) - Learning Word-Level Confidence For Subword End-to-End ASR [48.09713798451474]
We study the problem of word-level confidence estimation in subword-based end-to-end (E2E) models for automatic speech recognition (ASR)
The proposed confidence module also enables a model selection approach to combine an on-device E2E model with a hybrid model on the server to address the rare word recognition problem for the E2E model.
arXiv Detail & Related papers (2021-03-11T15:03:33Z) - Contextual RNN-T For Open Domain ASR [41.83409885125617]
End-to-end (E2E) systems for automatic speech recognition (ASR) blend the individual components of a traditional hybrid ASR system into a single neural network.
This has some nice advantages, it limits the system to be trained using only paired audio and text.
Because of this, E2E models tend to have difficulties with correctly recognizing rare words that are not frequently seen during training, such as entity names.
We propose modifications to the RNN-T model that allow the model to utilize additional metadata text with the objective of improving performance on these named entity words.
arXiv Detail & Related papers (2020-06-04T04:37:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.