Related papers: Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

URL: http://arxiv.org/abs/2306.16007v1
Date: Wed, 28 Jun 2023 08:29:00 GMT
Title: Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
Authors: Yuang Li, Yu Wu, Jinyu Li, Shujie Liu
Abstract summary: With only a domain-specific text prompt, we propose two zero-shot ASR domain adaptation methods using LLaMA. Experiments show that, with only one domain prompt, both methods can effectively reduce word error rates (WER) on out-of-domain TedLium-2 and SPGI datasets.
Score: 33.07184218085399
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The integration of Language Models (LMs) has proven to be an effective way to address domain shifts in speech recognition. However, these approaches usually require a significant amount of target domain text data for the training of LMs. Different from these methods, in this work, with only a domain-specific text prompt, we propose two zero-shot ASR domain adaptation methods using LLaMA, a 7-billion-parameter large language model (LLM). LLM is used in two ways: 1) second-pass rescoring: reranking N-best hypotheses of a given ASR system with LLaMA; 2) deep LLM-fusion: incorporating LLM into the decoder of an encoder-decoder based ASR system. Experiments show that, with only one domain prompt, both methods can effectively reduce word error rates (WER) on out-of-domain TedLium-2 and SPGISpeech datasets. Especially, the deep LLM-fusion has the advantage of better recall of entity and out-of-vocabulary words.

Related papers

LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression. LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model. Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z)
Reinforcement Learning for Long-Horizon Interactive LLM Agents [56.9860859585028]
Interactive digital agents (IDAs) leverage APIs of stateful digital environments to perform tasks in response to user requests. We present a reinforcement learning (RL) approach that trains IDAs directly in their target environments. We derive LOOP, a data- and memory-efficient variant of proximal policy optimization.
arXiv Detail & Related papers (2025-02-03T18:35:42Z)
FDLLM: A Text Fingerprint Detection Method for LLMs in Multi-Language, Multi-Domain Black-Box Environments [18.755880639770755]
Using large language models (LLMs) can lead to potential security risks. attackers may exploit this black-box scenario to deploy malicious models and embed viruses in the code provided to users. We propose the first LLMGT fingerprint detection model, textbfFDLLM, based on Qwen2.5-7B and fine-tuned using LoRA to address these challenges.
arXiv Detail & Related papers (2025-01-27T13:18:40Z)
Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition [17.376550014426623]
This paper presents an efficient decoding approach for end-to-end automatic speech recognition (E2E-ASR) with large language models (LLMs) We propose "delayed fusion," which applies LLM scores to ASR hypotheses with a delay during decoding. We demonstrate that delayed fusion provides improved decoding speed and accuracy compared to shallow fusion and N-best rescoring.
arXiv Detail & Related papers (2025-01-16T03:01:50Z)
Effective Text Adaptation for LLM-based ASR through Soft Prompt Fine-Tuning [12.676026149146772]
Large Language Models (LLM) has reformed the Automatic Speech Recognition (ASR) Fine-tuning such ASR on text-only data without paired prompts may diminish the effectiveness of domain-specific knowledge. We propose a two-step soft prompt fine-tuning strategy that enhances domain-specific text adaptation.
arXiv Detail & Related papers (2024-12-09T20:22:06Z)
LARR: Large Language Model Aided Real-time Scene Recommendation with Semantic Understanding [19.510385758079966]
Large Language Model Aided Real-time Scene Recommendation(LARR) This paper introduces Large Language Model Aided Real-time Scene Recommendation(LARR)
arXiv Detail & Related papers (2024-08-21T10:56:26Z)
Intermittent Semi-working Mask: A New Masking Paradigm for LLMs [13.271151693864114]
Multi-turn dialogues are a key interaction method between humans and Large Language Models (LLMs) We propose a novel masking scheme called Intermittent Semi-working Mask (ISM) to address these problems.
arXiv Detail & Related papers (2024-08-01T13:22:01Z)
Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 [61.189875635090225]
Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST)
arXiv Detail & Related papers (2024-06-24T16:38:17Z)
BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [56.89958793648104]
Large Language Models (LLMs) are versatile and capable of addressing a diverse range of tasks. Previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs. We present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models.
arXiv Detail & Related papers (2024-03-27T08:57:21Z)
Making Large Language Models A Better Foundation For Dense Retrieval [19.38740248464456]
Dense retrieval needs to learn discriminative text embeddings to represent the semantic relationship between query and document. It may benefit from the using of large language models (LLMs), given LLMs' strong capability on semantic understanding. We propose LLaRA (LLM adapted for dense RetrievAl), which works as a post-hoc adaptation of dense retrieval application.
arXiv Detail & Related papers (2023-12-24T15:10:35Z)
LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality. We propose LLMRefine, an inference time optimization method to refine LLM's output. We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization. LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z)
Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition [23.172469312225694]
We propose to utilize an instruction-tuned large language model (LLM) for guiding the text generation process in automatic speech recognition (ASR) The proposed model is built on the joint CTC and attention architecture, with the LLM serving as a front-end feature extractor for the decoder. Experimental results show that the proposed LLM-guided model achieves a relative gain of approximately 13% in word error rates across major benchmarks.
arXiv Detail & Related papers (2023-09-19T11:10:50Z)
Inference with Reference: Lossless Acceleration of Large Language Models [97.04200102556551]
LLMA is an accelerator to speed up Large Language Model (LLM) inference with references. It is motivated by the observation that there are abundant identical text spans between the decoding result by an LLM and the reference that is available in many real world scenarios.
arXiv Detail & Related papers (2023-04-10T09:55:14Z)
Improving Mandarin End-to-End Speech Recognition with Word N-gram Language Model [57.92200214957124]
External language models (LMs) are used to improve the recognition performance of end-to-end (E2E) automatic speech recognition (ASR) systems. We propose a novel decoding algorithm where a word-level lattice is constructed on-the-fly to consider all possible word sequences. Our method consistently outperforms subword-level LMs, including N-gram LM and neural network LM.
arXiv Detail & Related papers (2022-01-06T10:04:56Z)
On Language Model Integration for RNN Transducer based Speech Recognition [49.84285563767935]
We study various ILM correction-based LM integration methods formulated in a common RNN-T framework. We provide a decoding interpretation on two major reasons for performance improvement with ILM correction. We also propose an exact-ILM training framework by extending the proof given in the hybrid autoregressive transducer.
arXiv Detail & Related papers (2021-10-13T16:30:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.