Prompting Large Language Models for Zero-Shot Domain Adaptation in
Speech Recognition
- URL: http://arxiv.org/abs/2306.16007v1
- Date: Wed, 28 Jun 2023 08:29:00 GMT
- Title: Prompting Large Language Models for Zero-Shot Domain Adaptation in
Speech Recognition
- Authors: Yuang Li, Yu Wu, Jinyu Li, Shujie Liu
- Abstract summary: With only a domain-specific text prompt, we propose two zero-shot ASR domain adaptation methods using LLaMA.
Experiments show that, with only one domain prompt, both methods can effectively reduce word error rates (WER) on out-of-domain TedLium-2 and SPGI datasets.
- Score: 33.07184218085399
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The integration of Language Models (LMs) has proven to be an effective way to
address domain shifts in speech recognition. However, these approaches usually
require a significant amount of target domain text data for the training of
LMs. Different from these methods, in this work, with only a domain-specific
text prompt, we propose two zero-shot ASR domain adaptation methods using
LLaMA, a 7-billion-parameter large language model (LLM). LLM is used in two
ways: 1) second-pass rescoring: reranking N-best hypotheses of a given ASR
system with LLaMA; 2) deep LLM-fusion: incorporating LLM into the decoder of an
encoder-decoder based ASR system. Experiments show that, with only one domain
prompt, both methods can effectively reduce word error rates (WER) on
out-of-domain TedLium-2 and SPGISpeech datasets. Especially, the deep
LLM-fusion has the advantage of better recall of entity and out-of-vocabulary
words.
Related papers
- LARR: Large Language Model Aided Real-time Scene Recommendation with Semantic Understanding [19.510385758079966]
Large Language Model Aided Real-time Scene Recommendation(LARR)
This paper introduces Large Language Model Aided Real-time Scene Recommendation(LARR)
arXiv Detail & Related papers (2024-08-21T10:56:26Z) - Intermittent Semi-working Mask: A New Masking Paradigm for LLMs [13.271151693864114]
Multi-turn dialogues are a key interaction method between humans and Large Language Models (LLMs)
We propose a novel masking scheme called Intermittent Semi-working Mask (ISM) to address these problems.
arXiv Detail & Related papers (2024-08-01T13:22:01Z) - Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 [61.189875635090225]
Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST)
arXiv Detail & Related papers (2024-06-24T16:38:17Z) - BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [56.89958793648104]
Large Language Models (LLMs) are versatile and capable of addressing a diverse range of tasks.
Previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs.
We present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models.
arXiv Detail & Related papers (2024-03-27T08:57:21Z) - Making Large Language Models A Better Foundation For Dense Retrieval [19.38740248464456]
Dense retrieval needs to learn discriminative text embeddings to represent the semantic relationship between query and document.
It may benefit from the using of large language models (LLMs), given LLMs' strong capability on semantic understanding.
We propose LLaRA (LLM adapted for dense RetrievAl), which works as a post-hoc adaptation of dense retrieval application.
arXiv Detail & Related papers (2023-12-24T15:10:35Z) - LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality.
We propose LLMRefine, an inference time optimization method to refine LLM's output.
We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization.
LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z) - Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition [23.172469312225694]
We propose to utilize an instruction-tuned large language model (LLM) for guiding the text generation process in automatic speech recognition (ASR)
The proposed model is built on the joint CTC and attention architecture, with the LLM serving as a front-end feature extractor for the decoder.
Experimental results show that the proposed LLM-guided model achieves a relative gain of approximately 13% in word error rates across major benchmarks.
arXiv Detail & Related papers (2023-09-19T11:10:50Z) - Inference with Reference: Lossless Acceleration of Large Language Models [97.04200102556551]
LLMA is an accelerator to speed up Large Language Model (LLM) inference with references.
It is motivated by the observation that there are abundant identical text spans between the decoding result by an LLM and the reference that is available in many real world scenarios.
arXiv Detail & Related papers (2023-04-10T09:55:14Z) - Improving Mandarin End-to-End Speech Recognition with Word N-gram
Language Model [57.92200214957124]
External language models (LMs) are used to improve the recognition performance of end-to-end (E2E) automatic speech recognition (ASR) systems.
We propose a novel decoding algorithm where a word-level lattice is constructed on-the-fly to consider all possible word sequences.
Our method consistently outperforms subword-level LMs, including N-gram LM and neural network LM.
arXiv Detail & Related papers (2022-01-06T10:04:56Z) - On Language Model Integration for RNN Transducer based Speech
Recognition [49.84285563767935]
We study various ILM correction-based LM integration methods formulated in a common RNN-T framework.
We provide a decoding interpretation on two major reasons for performance improvement with ILM correction.
We also propose an exact-ILM training framework by extending the proof given in the hybrid autoregressive transducer.
arXiv Detail & Related papers (2021-10-13T16:30:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.