Related papers: "What's The Context?" : Long Context NLM Adaptation for ASR Rescoring in Conversational Agents

"What's The Context?" : Long Context NLM Adaptation for ASR Rescoring in Conversational Agents

URL: http://arxiv.org/abs/2104.11070v1
Date: Wed, 21 Apr 2021 00:15:21 GMT
Title: "What's The Context?" : Long Context NLM Adaptation for ASR Rescoring in Conversational Agents
Authors: Ashish Shenoy, Sravan Bodapati, Monica Sunkara, Srikanth Ronanki, Katrin Kirchhoff
Abstract summary: We investigate various techniques to incorporate turn based context history into both recurrent (LSTM) and Transformer-XL based NLMs. For recurrent based NLMs, we explore context carry over mechanism and feature based augmentation. We adapt our contextual NLM towards user provided on-the-fly speech patterns by leveraging encodings from a large pre-trained masked language model.
Score: 13.586996848831543
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural Language Models (NLM), when trained and evaluated with context spanning multiple utterances, have been shown to consistently outperform both conventional n-gram language models and NLMs that use limited context. In this paper, we investigate various techniques to incorporate turn based context history into both recurrent (LSTM) and Transformer-XL based NLMs. For recurrent based NLMs, we explore context carry over mechanism and feature based augmentation, where we incorporate other forms of contextual information such as bot response and system dialogue acts as classified by a Natural Language Understanding (NLU) model. To mitigate the sharp nearby, fuzzy far away problem with contextual NLM, we propose the use of attention layer over lexical metadata to improve feature based augmentation. Additionally, we adapt our contextual NLM towards user provided on-the-fly speech patterns by leveraging encodings from a large pre-trained masked language model and performing fusion with a Transformer-XL based NLM. We test our proposed models using N-best rescoring of ASR hypotheses of task-oriented dialogues and also evaluate on downstream NLU tasks such as intent classification and slot labeling. The best performing model shows a relative WER between 1.6% and 9.1% and a slot labeling F1 score improvement of 4% over non-contextual baselines.

Related papers

Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models [50.164379437671904]
We investigate the effectiveness of using a large ensemble of advanced neural language models (NLMs) for lattice rescoring on automatic speech recognition hypotheses. In experiments using a lecture speech corpus, by combining the eight NLMs and using context carry-over, we obtained a 24.4% relative word error rate reduction from the ASR 1-best baseline.
arXiv Detail & Related papers (2023-12-20T04:52:24Z)
Generative Context-aware Fine-tuning of Self-supervised Speech Models [54.389711404209415]
We study the use of generative large language models (LLM) generated context information. We propose an approach to distill the generated information during fine-tuning of self-supervised speech models. We evaluate the proposed approach using the SLUE and Libri-light benchmarks for several downstream tasks: automatic speech recognition, named entity recognition, and sentiment analysis.
arXiv Detail & Related papers (2023-12-15T15:46:02Z)
Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning [51.90524745663737]
A key innovation is our use of explanations as features, which can be used to boost GNN performance on downstream tasks. Our method achieves state-of-the-art results on well-established TAG datasets. Our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on ogbn-arxiv.
arXiv Detail & Related papers (2023-05-31T03:18:03Z)
You can't pick your neighbors, or can you? When and how to rely on retrieval in the $k$NN-LM [65.74934004876914]
Retrieval-enhanced language models (LMs) condition their predictions on text retrieved from large external datastores. One such approach, the $k$NN-LM, interpolates any existing LM's predictions with the output of a $k$-nearest neighbors model. We empirically measure the effectiveness of our approach on two English language modeling datasets.
arXiv Detail & Related papers (2022-10-28T02:57:40Z)
GNN-LM: Language Modeling based on Global Contexts via GNN [32.52117529283929]
We introduce GNN-LM, which extends the vanilla neural language model (LM) by allowing to reference similar contexts in the entire training corpus. GNN-LM achieves a new state-of-the-art perplexity of 14.8 on WikiText-103.
arXiv Detail & Related papers (2021-10-17T07:18:21Z)
ASR Adaptation for E-commerce Chatbots using Cross-Utterance Context and Multi-Task Language Modeling [11.193867567895353]
Cross utterance contextual cues play an important role in disambiguating domain specific content words from speech. In this paper, we investigate various techniques to improve contextualization, content word robustness and domain adaptation of a Transformer-XL neural language model (NLM) Our best performing NLM rescorer results in a content WER reduction of 19.2% on e-commerce audio test set and a slot labeling F1 improvement of 6.4%.
arXiv Detail & Related papers (2021-06-15T21:27:34Z)
Contextual Biasing of Language Models for Speech Recognition in Goal-Oriented Conversational Agents [11.193867567895353]
Goal-oriented conversational interfaces are designed to accomplish specific tasks. We propose a new architecture that utilizes context embeddings derived from BERT on sample utterances provided during inference time. Our experiments show a word error rate (WER) relative reduction of 7% over non-contextual utterance-level NLM rescorers on goal-oriented audio datasets.
arXiv Detail & Related papers (2021-03-18T15:38:08Z)
On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech [0.0]
We have significantly improved the online performance of a conversational speech transcription system by transferring knowledge from a RNNLM to the single pass BNLM with text generation based data augmentation. We show that using the RNN-BNLM in the first pass followed by a neural second pass, offline ASR results can be even significantly improved.
arXiv Detail & Related papers (2020-06-09T09:01:04Z)
A Comparative Study of Lexical Substitution Approaches based on Neural Language Models [117.96628873753123]
We present a large-scale comparative study of popular neural language and masked language models. We show that already competitive results achieved by SOTA LMs/MLMs can be further improved if information about the target word is injected properly.
arXiv Detail & Related papers (2020-05-29T18:43:22Z)
Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU) We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.