Can Contextual Biasing Remain Effective with Whisper and GPT-2?
- URL: http://arxiv.org/abs/2306.01942v1
- Date: Fri, 2 Jun 2023 22:56:01 GMT
- Title: Can Contextual Biasing Remain Effective with Whisper and GPT-2?
- Authors: Guangzhi Sun, Xianrui Zheng, Chao Zhang, Philip C. Woodland
- Abstract summary: This paper investigates the effectiveness of neural contextual biasing for Whisper combined with GPT-2.
Experiments across three datasets show a considerable reduction in errors on biasing words with a biasing list of 1000 words.
- Score: 18.783162616664363
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: End-to-end automatic speech recognition (ASR) and large language models, such
as Whisper and GPT-2, have recently been scaled to use vast amounts of training
data. Despite the large amount of training data, infrequent content words that
occur in a particular task may still exhibit poor ASR performance, with
contextual biasing a possible remedy. This paper investigates the effectiveness
of neural contextual biasing for Whisper combined with GPT-2. Specifically,
this paper proposes integrating an adapted tree-constrained pointer generator
(TCPGen) component for Whisper and a dedicated training scheme to dynamically
adjust the final output without modifying any Whisper model parameters.
Experiments across three datasets show a considerable reduction in errors on
biasing words with a biasing list of 1000 words. Contextual biasing was more
effective when applied to domain-specific data and can boost the performance of
Whisper and GPT-2 without losing their generality.
Related papers
- Contextual Biasing to Improve Domain-specific Custom Vocabulary Audio Transcription without Explicit Fine-Tuning of Whisper Model [0.0]
OpenAI's Whisper Automated Speech Recognition model excels in generalizing across diverse datasets and domains.
We propose a method to enhance transcription accuracy without explicit fine-tuning or altering model parameters.
arXiv Detail & Related papers (2024-10-24T01:58:11Z) - Text Injection for Neural Contextual Biasing [57.589903308622745]
This work proposes contextual text injection (CTI) to enhance contextual ASR.
CTI with 100 billion text sentences can achieve up to 43.3% relative WER reduction from a strong neural biasing model.
arXiv Detail & Related papers (2024-06-05T04:20:17Z) - Improving ASR Contextual Biasing with Guided Attention [47.74990801299927]
A common challenge in previous literature is that the word error rate (WER) reduction brought by contextual biasing diminishes as the number of bias phrases increases.
We propose a Guided Attention (GA) auxiliary training loss, which improves the effectiveness and robustness of automatic speech recognition (ASR) contextual biasing without introducing additional parameters.
arXiv Detail & Related papers (2024-01-16T21:16:12Z) - Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR
Customization [66.22007368434633]
We present a first large-scale public synthetic dataset for contextual spellchecking customization of automatic speech recognition (ASR)
The proposed approach allows creating millions of realistic examples of corrupted ASR hypotheses and simulate non-trivial biasing lists for the customization task.
We report experiments with training an open-source customization model on the proposed dataset and show that the injection of hard negative biasing phrases decreases WER and the number of false alarms.
arXiv Detail & Related papers (2023-09-29T14:18:59Z) - Noisy Self-Training with Data Augmentations for Offensive and Hate
Speech Detection Tasks [3.703767478524629]
"Noisy" self-training approaches incorporate data augmentation techniques to ensure prediction consistency and increase robustness against adversarial attacks.
We evaluate our experiments on two offensive/hate-speech datasets and demonstrate that (i) self-training consistently improves performance regardless of model size, resulting in up to +1.5% F1-macro on both datasets, and (ii) noisy self-training with textual data augmentations, despite being successfully applied in similar settings, decreases performance on offensive and hate-speech domains when compared to the default method, even with state-of-the-art augmentations such as backtranslation.
arXiv Detail & Related papers (2023-07-31T12:35:54Z) - M-Tuning: Prompt Tuning with Mitigated Label Bias in Open-Set Scenarios [103.6153593636399]
We propose a vision-language prompt tuning method with mitigated label bias (M-Tuning)
It introduces open words from the WordNet to extend the range of words forming the prompt texts from only closed-set label words to more, and thus prompts are tuned in a simulated open-set scenario.
Our method achieves the best performance on datasets with various scales, and extensive ablation studies also validate its effectiveness.
arXiv Detail & Related papers (2023-03-09T09:05:47Z) - AugGPT: Leveraging ChatGPT for Text Data Augmentation [59.76140039943385]
We propose a text data augmentation approach based on ChatGPT (named AugGPT)
AugGPT rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples.
Experiment results on few-shot learning text classification tasks show the superior performance of the proposed AugGPT approach.
arXiv Detail & Related papers (2023-02-25T06:58:16Z) - Distribution augmentation for low-resource expressive text-to-speech [18.553812159109253]
This paper presents a novel data augmentation technique for text-to-speech (TTS)
It allows to generate new (text, audio) training examples without requiring any additional data.
arXiv Detail & Related papers (2022-02-13T21:19:31Z) - Improving Robustness by Augmenting Training Sentences with
Predicate-Argument Structures [62.562760228942054]
Existing approaches to improve robustness against dataset biases mostly focus on changing the training objective.
We propose to augment the input sentences in the training data with their corresponding predicate-argument structures.
We show that without targeting a specific bias, our sentence augmentation improves the robustness of transformer models against multiple biases.
arXiv Detail & Related papers (2020-10-23T16:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.