Leveraging Large Language Models for Exploiting ASR Uncertainty
- URL: http://arxiv.org/abs/2309.04842v2
- Date: Tue, 12 Sep 2023 16:46:26 GMT
- Title: Leveraging Large Language Models for Exploiting ASR Uncertainty
- Authors: Pranay Dighe, Yi Su, Shangshang Zheng, Yunshu Liu, Vineet Garg,
Xiaochuan Niu, Ahmed Tewfik
- Abstract summary: Large language models must either rely on off-the-shelf automatic speech recognition systems for transcription, or be equipped with an in-built speech modality.
We tackle speech-intent classification task, where a high word-error-rate can limit the LLM's ability to understand the spoken intent.
We propose prompting the LLM with an n-best list of ASR hypotheses instead of only the error-prone 1-best hypothesis.
- Score: 16.740712975166407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While large language models excel in a variety of natural language processing
(NLP) tasks, to perform well on spoken language understanding (SLU) tasks, they
must either rely on off-the-shelf automatic speech recognition (ASR) systems
for transcription, or be equipped with an in-built speech modality. This work
focuses on the former scenario, where LLM's accuracy on SLU tasks is
constrained by the accuracy of a fixed ASR system on the spoken input.
Specifically, we tackle speech-intent classification task, where a high
word-error-rate can limit the LLM's ability to understand the spoken intent.
Instead of chasing a high accuracy by designing complex or specialized
architectures regardless of deployment costs, we seek to answer how far we can
go without substantially changing the underlying ASR and LLM, which can
potentially be shared by multiple unrelated tasks. To this end, we propose
prompting the LLM with an n-best list of ASR hypotheses instead of only the
error-prone 1-best hypothesis. We explore prompt-engineering to explain the
concept of n-best lists to the LLM; followed by the finetuning of Low-Rank
Adapters on the downstream tasks. Our approach using n-best lists proves to be
effective on a device-directed speech detection task as well as on a keyword
spotting task, where systems using n-best list prompts outperform those using
1-best ASR hypothesis; thus paving the way for an efficient method to exploit
ASR uncertainty via LLMs for speech-based applications.
Related papers
- LA-RAG:Enhancing LLM-based ASR Accuracy with Retrieval-Augmented Generation [15.520180125182756]
Recent advancements in integrating speech information into large language models (LLMs) have significantly improved automatic speech recognition (ASR) accuracy.
Existing methods often constrained by the capabilities of the speech encoders under varied acoustic conditions, such as accents.
We propose LA-RAG, a novel Retrieval-Augmented Generation (RAG) paradigm for LLM-based ASR.
arXiv Detail & Related papers (2024-09-13T07:28:47Z) - Towards interfacing large language models with ASR systems using confidence measures and prompting [54.39667883394458]
This work investigates post-hoc correction of ASR transcripts with large language models (LLMs)
To avoid introducing errors into likely accurate transcripts, we propose a range of confidence-based filtering methods.
Our results indicate that this can improve the performance of less competitive ASR systems.
arXiv Detail & Related papers (2024-07-31T08:00:41Z) - An Embarrassingly Simple Approach for LLM with Strong ASR Capacity [56.30595787061546]
We focus on solving one of the most important tasks in the field of speech processing, with speech foundation encoders and large language models (LLM)
Recent works have complex designs such as compressing the output temporally for the speech encoder, tackling modal alignment for the projector, and utilizing parameter-efficient fine-tuning for the LLM.
We found that delicate designs are not necessary, while an embarrassingly simple composition of off-the-shelf speech encoder, LLM, and the only trainable linear projector is competent for the ASR task.
arXiv Detail & Related papers (2024-02-13T23:25:04Z) - Large Language Models are Efficient Learners of Noise-Robust Speech
Recognition [65.95847272465124]
Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR)
In this work, we extend the benchmark to noisy conditions and investigate if we can teach LLMs to perform denoising for GER.
Experiments on various latest LLMs demonstrate our approach achieves a new breakthrough with up to 53.9% correction improvement in terms of word error rate.
arXiv Detail & Related papers (2024-01-19T01:29:27Z) - Towards ASR Robust Spoken Language Understanding Through In-Context
Learning With Word Confusion Networks [68.79880423713597]
We introduce a method that utilizes the ASR system's lattice output instead of relying solely on the top hypothesis.
Our in-context learning experiments, covering spoken question answering and intent classification, underline the LLM's resilience to noisy speech transcripts.
arXiv Detail & Related papers (2024-01-05T17:58:10Z) - Generative Speech Recognition Error Correction with Large Language
Models and Task-Activating Prompting [32.70214938434769]
We explore the ability of large language models (LLMs) to act as speech recognition post-processors.
We evaluate different prompting schemes, both zero- and few-shot in-context learning, and a novel task activation prompting method.
We show that rescoring only by in-context learning with frozen LLMs achieves results that are competitive with rescoring by domain-tuned LMs.
arXiv Detail & Related papers (2023-09-27T13:36:03Z) - Exploring the Integration of Large Language Models into Automatic Speech
Recognition Systems: An Empirical Study [0.0]
This paper explores the integration of Large Language Models (LLMs) into Automatic Speech Recognition (ASR) systems.
Our primary focus is to investigate the potential of using an LLM's in-context learning capabilities to enhance the performance of ASR systems.
arXiv Detail & Related papers (2023-07-13T02:31:55Z) - SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs)
We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer.
We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.