"What's The Context?" : Long Context NLM Adaptation for ASR Rescoring in
Conversational Agents
- URL: http://arxiv.org/abs/2104.11070v1
- Date: Wed, 21 Apr 2021 00:15:21 GMT
- Title: "What's The Context?" : Long Context NLM Adaptation for ASR Rescoring in
Conversational Agents
- Authors: Ashish Shenoy, Sravan Bodapati, Monica Sunkara, Srikanth Ronanki,
Katrin Kirchhoff
- Abstract summary: We investigate various techniques to incorporate turn based context history into both recurrent (LSTM) and Transformer-XL based NLMs.
For recurrent based NLMs, we explore context carry over mechanism and feature based augmentation.
We adapt our contextual NLM towards user provided on-the-fly speech patterns by leveraging encodings from a large pre-trained masked language model.
- Score: 13.586996848831543
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural Language Models (NLM), when trained and evaluated with context
spanning multiple utterances, have been shown to consistently outperform both
conventional n-gram language models and NLMs that use limited context. In this
paper, we investigate various techniques to incorporate turn based context
history into both recurrent (LSTM) and Transformer-XL based NLMs. For recurrent
based NLMs, we explore context carry over mechanism and feature based
augmentation, where we incorporate other forms of contextual information such
as bot response and system dialogue acts as classified by a Natural Language
Understanding (NLU) model. To mitigate the sharp nearby, fuzzy far away problem
with contextual NLM, we propose the use of attention layer over lexical
metadata to improve feature based augmentation. Additionally, we adapt our
contextual NLM towards user provided on-the-fly speech patterns by leveraging
encodings from a large pre-trained masked language model and performing fusion
with a Transformer-XL based NLM. We test our proposed models using N-best
rescoring of ASR hypotheses of task-oriented dialogues and also evaluate on
downstream NLU tasks such as intent classification and slot labeling. The best
performing model shows a relative WER between 1.6% and 9.1% and a slot labeling
F1 score improvement of 4% over non-contextual baselines.
Related papers
- Lattice Rescoring Based on Large Ensemble of Complementary Neural
Language Models [50.164379437671904]
We investigate the effectiveness of using a large ensemble of advanced neural language models (NLMs) for lattice rescoring on automatic speech recognition hypotheses.
In experiments using a lecture speech corpus, by combining the eight NLMs and using context carry-over, we obtained a 24.4% relative word error rate reduction from the ASR 1-best baseline.
arXiv Detail & Related papers (2023-12-20T04:52:24Z) - Generative Context-aware Fine-tuning of Self-supervised Speech Models [54.389711404209415]
We study the use of generative large language models (LLM) generated context information.
We propose an approach to distill the generated information during fine-tuning of self-supervised speech models.
We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks: automatic speech recognition, named entity recognition, and sentiment analysis.
arXiv Detail & Related papers (2023-12-15T15:46:02Z) - Harnessing Explanations: LLM-to-LM Interpreter for Enhanced
Text-Attributed Graph Representation Learning [51.90524745663737]
A key innovation is our use of explanations as features, which can be used to boost GNN performance on downstream tasks.
Our method achieves state-of-the-art results on well-established TAG datasets.
Our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on ogbn-arxiv.
arXiv Detail & Related papers (2023-05-31T03:18:03Z) - You can't pick your neighbors, or can you? When and how to rely on
retrieval in the $k$NN-LM [65.74934004876914]
Retrieval-enhanced language models (LMs) condition their predictions on text retrieved from large external datastores.
One such approach, the $k$NN-LM, interpolates any existing LM's predictions with the output of a $k$-nearest neighbors model.
We empirically measure the effectiveness of our approach on two English language modeling datasets.
arXiv Detail & Related papers (2022-10-28T02:57:40Z) - GNN-LM: Language Modeling based on Global Contexts via GNN [32.52117529283929]
We introduce GNN-LM, which extends the vanilla neural language model (LM) by allowing to reference similar contexts in the entire training corpus.
GNN-LM achieves a new state-of-the-art perplexity of 14.8 on WikiText-103.
arXiv Detail & Related papers (2021-10-17T07:18:21Z) - ASR Adaptation for E-commerce Chatbots using Cross-Utterance Context and
Multi-Task Language Modeling [11.193867567895353]
Cross utterance contextual cues play an important role in disambiguating domain specific content words from speech.
In this paper, we investigate various techniques to improve contextualization, content word robustness and domain adaptation of a Transformer-XL neural language model (NLM)
Our best performing NLM rescorer results in a content WER reduction of 19.2% on e-commerce audio test set and a slot labeling F1 improvement of 6.4%.
arXiv Detail & Related papers (2021-06-15T21:27:34Z) - Contextual Biasing of Language Models for Speech Recognition in
Goal-Oriented Conversational Agents [11.193867567895353]
Goal-oriented conversational interfaces are designed to accomplish specific tasks.
We propose a new architecture that utilizes context embeddings derived from BERT on sample utterances provided during inference time.
Our experiments show a word error rate (WER) relative reduction of 7% over non-contextual utterance-level NLM rescorers on goal-oriented audio datasets.
arXiv Detail & Related papers (2021-03-18T15:38:08Z) - On the Effectiveness of Neural Text Generation based Data Augmentation
for Recognition of Morphologically Rich Speech [0.0]
We have significantly improved the online performance of a conversational speech transcription system by transferring knowledge from a RNNLM to the single pass BNLM with text generation based data augmentation.
We show that using the RNN-BNLM in the first pass followed by a neural second pass, offline ASR results can be even significantly improved.
arXiv Detail & Related papers (2020-06-09T09:01:04Z) - A Comparative Study of Lexical Substitution Approaches based on Neural
Language Models [117.96628873753123]
We present a large-scale comparative study of popular neural language and masked language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further improved if information about the target word is injected properly.
arXiv Detail & Related papers (2020-05-29T18:43:22Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.