On-the-fly Text Retrieval for End-to-End ASR Adaptation
- URL: http://arxiv.org/abs/2303.10942v1
- Date: Mon, 20 Mar 2023 08:54:40 GMT
- Title: On-the-fly Text Retrieval for End-to-End ASR Adaptation
- Authors: Bolaji Yusuf, Aditya Gourav, Ankur Gandhe, Ivan Bulyko
- Abstract summary: We propose augmenting a transducer-based ASR model with a retrieval language model, which retrieves from an external text corpus plausible completions for a partial ASR hypothesis.
Our experiments show that the proposed model significantly improves the performance of a transducer baseline on a pair of question-answering datasets.
- Score: 9.304386210911822
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: End-to-end speech recognition models are improved by incorporating external
text sources, typically by fusion with an external language model. Such
language models have to be retrained whenever the corpus of interest changes.
Furthermore, since they store the entire corpus in their parameters, rare words
can be challenging to recall. In this work, we propose augmenting a
transducer-based ASR model with a retrieval language model, which directly
retrieves from an external text corpus plausible completions for a partial ASR
hypothesis. These completions are then integrated into subsequent predictions
by an adapter, which is trained once, so that the corpus of interest can be
switched without incurring the computational overhead of retraining. Our
experiments show that the proposed model significantly improves the performance
of a transducer baseline on a pair of question-answering datasets. Further, it
outperforms shallow fusion on recognition of named entities by about 7
relative; when the two are combined, the relative improvement increases to 13%.
Related papers
- End-to-End Trainable Retrieval-Augmented Generation for Relation Extraction [7.613942320502336]
We propose a novel End-to-end Trainable Retrieval-Augmented Generation (ETRAG)
ETRAG allows end-to-end optimization of the entire model, including the retriever, for the relation extraction objective.
We evaluate the relation extraction performance of ETRAG on the TACRED dataset, which is a standard benchmark for relation extraction.
arXiv Detail & Related papers (2024-06-06T07:01:50Z) - Retrieval is Accurate Generation [99.24267226311157]
We introduce a novel method that selects context-aware phrases from a collection of supporting documents.
Our model achieves the best performance and the lowest latency among several retrieval-augmented baselines.
arXiv Detail & Related papers (2024-02-27T14:16:19Z) - ReFusion: Improving Natural Language Understanding with Computation-Efficient Retrieval Representation Fusion [22.164620956284466]
Retrieval-based augmentations (RA) incorporating knowledge from an external database into language models have greatly succeeded in various knowledge-intensive (KI) tasks.
Existing works focus on concatenating retrievals with inputs to improve model performance.
This paper proposes a new paradigm of RA named textbfReFusion, a computation-efficient Retrieval representation Fusion with bi-level optimization.
arXiv Detail & Related papers (2024-01-04T07:39:26Z) - RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder
for Language Modeling [79.56442336234221]
We introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE)
It encodes the text corpus into a latent space, capturing current and future information from both source and target text.
Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
arXiv Detail & Related papers (2023-10-16T16:42:01Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Optimizing Factual Accuracy in Text Generation through Dynamic Knowledge
Selection [71.20871905457174]
Language models (LMs) have revolutionized the way we interact with information, but they often generate nonfactual text.
Previous methods use external knowledge as references for text generation to enhance factuality but often struggle with the knowledge mix-up of irrelevant references.
We present DKGen, which divide the text generation process into an iterative process.
arXiv Detail & Related papers (2023-08-30T02:22:40Z) - BRENT: Bidirectional Retrieval Enhanced Norwegian Transformer [1.911678487931003]
Retrieval-based language models are increasingly employed in question-answering tasks.
We develop the first Norwegian retrieval-based model by adapting the REALM framework.
We show that this type of training improves the reader's performance on extractive question-answering.
arXiv Detail & Related papers (2023-04-19T13:40:47Z) - Using External Off-Policy Speech-To-Text Mappings in Contextual
End-To-End Automated Speech Recognition [19.489794740679024]
We investigate the potential of leveraging external knowledge, particularly through off-policy key-value stores generated with text-to-speech methods.
In our approach, audio embeddings captured from text-to-speech, along with semantic text embeddings, are used to bias ASR.
Experiments on LibiriSpeech and in-house voice assistant/search datasets show that the proposed approach can reduce domain adaptation time by up to 1K GPU-hours.
arXiv Detail & Related papers (2023-01-06T22:32:50Z) - Retrieval-based Disentangled Representation Learning with Natural
Language Supervision [61.75109410513864]
We present Vocabulary Disentangled Retrieval (VDR), a retrieval-based framework that harnesses natural language as proxies of the underlying data variation to drive disentangled representation learning.
Our approach employ a bi-encoder model to represent both data and natural language in a vocabulary space, enabling the model to distinguish intrinsic dimensions that capture characteristics within data through its natural language counterpart, thus disentanglement.
arXiv Detail & Related papers (2022-12-15T10:20:42Z) - Attention-based Multi-hypothesis Fusion for Speech Summarization [83.04957603852571]
Speech summarization can be achieved by combining automatic speech recognition (ASR) and text summarization (TS)
ASR errors directly affect the quality of the output summary in the cascade approach.
We propose a cascade speech summarization model that is robust to ASR errors and that exploits multiple hypotheses generated by ASR to attenuate the effect of ASR errors on the summary.
arXiv Detail & Related papers (2021-11-16T03:00:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.