Adapting an Unadaptable ASR System
- URL: http://arxiv.org/abs/2306.01208v1
- Date: Thu, 1 Jun 2023 23:54:11 GMT
- Title: Adapting an Unadaptable ASR System
- Authors: Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill
- Abstract summary: We consider the recently released OpenAI Whisper ASR as an example of a large-scale ASR system to assess adaptation methods.
An error correction based approach is adopted, as this does not require access to the model.
The generalization ability of the system in two distinct dimensions are then evaluated.
- Score: 40.402050390096456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As speech recognition model sizes and training data requirements grow, it is
increasingly common for systems to only be available via APIs from online
service providers rather than having direct access to models themselves. In
this scenario it is challenging to adapt systems to a specific target domain.
To address this problem we consider the recently released OpenAI Whisper ASR as
an example of a large-scale ASR system to assess adaptation methods. An error
correction based approach is adopted, as this does not require access to the
model, but can be trained from either 1-best or N-best outputs that are
normally available via the ASR API. LibriSpeech is used as the primary target
domain for adaptation. The generalization ability of the system in two distinct
dimensions are then evaluated. First, whether the form of correction model is
portable to other speech recognition domains, and secondly whether it can be
used for ASR models having a different architecture.
Related papers
- ASR Error Correction using Large Language Models [4.75940708384553]
Error correction (EC) models play a crucial role in refining Automatic Speech Recognition (ASR) transcriptions.
This work investigates the use of large language models (LLMs) for error correction across diverse scenarios.
arXiv Detail & Related papers (2024-09-14T23:33:38Z) - Towards interfacing large language models with ASR systems using confidence measures and prompting [54.39667883394458]
This work investigates post-hoc correction of ASR transcripts with large language models (LLMs)
To avoid introducing errors into likely accurate transcripts, we propose a range of confidence-based filtering methods.
Our results indicate that this can improve the performance of less competitive ASR systems.
arXiv Detail & Related papers (2024-07-31T08:00:41Z) - Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models [84.8919069953397]
Self-TAught Recognizer (STAR) is an unsupervised adaptation framework for speech recognition systems.
We show that STAR achieves an average of 13.5% relative reduction in word error rate across 14 target domains.
STAR exhibits high data efficiency that only requires less than one-hour unlabeled data.
arXiv Detail & Related papers (2024-05-23T04:27:11Z) - Contextual Adapters for Personalized Speech Recognition in Neural
Transducers [16.628830937429388]
We propose training neural contextual adapters for personalization in neural transducer based ASR models.
Our approach can not only bias towards user-defined words, but also has the flexibility to work with pretrained ASR models.
arXiv Detail & Related papers (2022-05-26T22:46:28Z) - Neural Model Reprogramming with Similarity Based Mapping for
Low-Resource Spoken Command Recognition [71.96870151495536]
We propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR)
The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model.
We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech.
arXiv Detail & Related papers (2021-10-08T05:07:35Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z) - Residual Adapters for Parameter-Efficient ASR Adaptation to Atypical and
Accented Speech [5.960279280033886]
We show that by adding a relatively small number of extra parameters to the encoder layers via so-called residual adapter, we can achieve similar adaptation gains compared to model fine-tuning.
We demonstrate this on two speech adaptation tasks (atypical and accented speech) and for two state-of-the-art ASR architectures.
arXiv Detail & Related papers (2021-09-14T20:04:47Z) - Do You Listen with One or Two Microphones? A Unified ASR Model for
Single and Multi-Channel Audio [20.932685675759117]
We propose a unified ASR model that can serve both textitprimary-only (PO) and textitprimary-plus-auxiliary (PPA) modes.
We demonstrate its efficacy in a realistic scenario where a set of devices typically stream a single primary audio channel, and two additional auxiliary channels textitonly when upload bandwidth allows it.
arXiv Detail & Related papers (2021-06-04T22:58:42Z) - ASR Error Correction and Domain Adaptation Using Machine Translation [32.27379508770736]
We propose a technique to perform domain adaptation for ASR error correction via machine translation.
We observe absolute improvement in word error rate and 4 point absolute improvement in BLEU score in Google ASR output.
arXiv Detail & Related papers (2020-03-13T20:05:38Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.