Related papers: Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization

Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization

URL: http://arxiv.org/abs/2309.17267v1
Date: Fri, 29 Sep 2023 14:18:59 GMT
Title: Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization
Authors: Alexandra Antonova
Abstract summary: We present a first large-scale public synthetic dataset for contextual spellchecking customization of automatic speech recognition (ASR) The proposed approach allows creating millions of realistic examples of corrupted ASR hypotheses and simulate non-trivial biasing lists for the customization task. We report experiments with training an open-source customization model on the proposed dataset and show that the injection of hard negative biasing phrases decreases WER and the number of false alarms.
Score: 66.22007368434633
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a first large-scale public synthetic dataset for contextual spellchecking customization of automatic speech recognition (ASR) with focus on diverse rare and out-of-vocabulary (OOV) phrases, such as proper names or terms. The proposed approach allows creating millions of realistic examples of corrupted ASR hypotheses and simulate non-trivial biasing lists for the customization task. Furthermore, we propose injecting two types of ``hard negatives" to the simulated biasing lists in training examples and describe our procedures to automatically mine them. We report experiments with training an open-source customization model on the proposed dataset and show that the injection of hard negative biasing phrases decreases WER and the number of false alarms.

Related papers

Attacking Misinformation Detection Using Adversarial Examples Generated by Language Models [0.0]
We investigate the challenge of generating adversarial examples to test the robustness of text classification algorithms. We focus on simulation of content moderation by setting realistic limits on the number of queries an attacker is allowed to attempt.
arXiv Detail & Related papers (2024-10-28T11:46:30Z)
Self-Adaptive Reconstruction with Contrastive Learning for Unsupervised Sentence Embeddings [24.255946996327104]
Unsupervised sentence embeddings task aims to convert sentences to semantic vector representations. Due to the token bias in pretrained language models, the models can not capture the fine-grained semantics in sentences. We propose a novel Self-Adaptive Reconstruction Contrastive Sentence Embeddings framework.
arXiv Detail & Related papers (2024-02-23T07:28:31Z)
Contextualized Automatic Speech Recognition with Attention-Based Bias Phrase Boosted Beam Search [44.94458898538114]
This paper proposes an attention-based contextual biasing method that can be customized using an editable phrase list. The proposed method can be trained effectively by combining a bias phrase index loss and special tokens to detect the bias phrases in the input speech data.
arXiv Detail & Related papers (2024-01-19T01:36:07Z)
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction. The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses. LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z)
Finding Support Examples for In-Context Learning [73.90376920653507]
We propose LENS, a fiLter-thEN-Search method to tackle this challenge in two stages. First we filter the dataset to obtain informative in-context examples individually. Then we propose diversity-guided example search which iteratively refines and evaluates the selected example permutations.
arXiv Detail & Related papers (2023-02-27T06:32:45Z)
Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems [27.483603895258437]
We introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system. We propose filtering algorithms to handle large-size context lists, and performance balancing mechanisms to control the biasing degree of the model. Experiments show that the proposed method achieves as much as 51% relative word error rate (WER) reduction over ASR system and outperforms traditional biasing methods.
arXiv Detail & Related papers (2022-03-02T06:00:48Z)
Semi-Supervised Spoken Language Understanding via Self-Supervised Speech and Language Model Pretraining [64.35907499990455]
We propose a framework to learn semantics directly from speech with semi-supervision from transcribed or untranscribed speech. Our framework is built upon pretrained end-to-end (E2E) ASR and self-supervised language models, such as BERT. In parallel, we identify two essential criteria for evaluating SLU models: environmental noise-robustness and E2E semantics evaluation.
arXiv Detail & Related papers (2020-10-26T18:21:27Z)
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior [53.69310441063162]
This paper proposes a sequential prior in a discrete latent space which can generate more naturally sounding samples. We evaluate the approach using listening tests, objective metrics of automatic speech recognition (ASR) performance, and measurements of prosody attributes.
arXiv Detail & Related papers (2020-02-06T12:35:50Z)
Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU) We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.