Text Augmentation for Language Models in High Error Recognition Scenario
- URL: http://arxiv.org/abs/2011.06056v1
- Date: Wed, 11 Nov 2020 20:21:21 GMT
- Title: Text Augmentation for Language Models in High Error Recognition Scenario
- Authors: Karel Bene\v{s} and Luk\'a\v{s} Burget
- Abstract summary: We compare augmentation based on global error statistics with one based on per-word unigram statistics of ASR errors.
Our best augmentation scheme increases the absolute WER improvement from second-pass rescoring from 1.1 % to 1.9 % absolute on the CHiMe-6 challenge.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We examine the effect of data augmentation for training of language models
for speech recognition. We compare augmentation based on global error
statistics with one based on per-word unigram statistics of ASR errors and
observe that it is better to only pay attention the global substitution,
deletion and insertion rates. This simple scheme also performs consistently
better than label smoothing and its sampled variants. Additionally, we
investigate into the behavior of perplexity estimated on augmented data, but
conclude that it gives no better prediction of the final error rate. Our best
augmentation scheme increases the absolute WER improvement from second-pass
rescoring from 1.1 % to 1.9 % absolute on the CHiMe-6 challenge.
Related papers
- Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking [68.77659513993507]
We present a simple and effective N-best re-ranking approach to improve multilingual ASR accuracy.
Our results show spoken language identification accuracy improvements of 8.7% and 6.1%, respectively, and word error rates which are 3.3% and 2.0% lower on these benchmarks.
arXiv Detail & Related papers (2024-09-27T03:31:32Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Improving Sampling Methods for Fine-tuning SentenceBERT in Text Streams [49.3179290313959]
This study explores the efficacy of seven text sampling methods designed to selectively fine-tune language models.
We precisely assess the impact of these methods on fine-tuning the SBERT model using four different loss functions.
Our findings indicate that Softmax loss and Batch All Triplets loss are particularly effective for text stream classification.
arXiv Detail & Related papers (2024-03-18T23:41:52Z) - UCorrect: An Unsupervised Framework for Automatic Speech Recognition
Error Correction [18.97378605403447]
We propose UCorrect, an unsupervised Detector-Generator-Selector framework for ASR Error Correction.
Experiments on the public AISHELL-1 dataset and WenetSpeech dataset show the effectiveness of UCorrect.
arXiv Detail & Related papers (2024-01-11T06:30:07Z) - Generative error correction for code-switching speech recognition using
large language models [49.06203730433107]
Code-switching (CS) speech refers to the phenomenon of mixing two or more languages within the same sentence.
We propose to leverage large language models (LLMs) and lists of hypotheses generated by an ASR to address the CS problem.
arXiv Detail & Related papers (2023-10-17T14:49:48Z) - UZH_CLyp at SemEval-2023 Task 9: Head-First Fine-Tuning and ChatGPT Data
Generation for Cross-Lingual Learning in Tweet Intimacy Prediction [3.1798318618973362]
This paper describes the submission of UZH_CLyp for the SemEval 2023 Task 9 "Multilingual Tweet Intimacy Analysis"
We achieved second-best results in all 10 languages according to the official Pearson's correlation regression evaluation measure.
arXiv Detail & Related papers (2023-03-02T12:18:53Z) - SpeechBlender: Speech Augmentation Framework for Mispronunciation Data
Generation [11.91301106502376]
SpeechBlender is a fine-grained data augmentation pipeline for generating mispronunciation errors.
Our proposed technique achieves state-of-the-art results, with Speechocean762, on ASR dependent mispronunciation detection models.
arXiv Detail & Related papers (2022-11-02T07:13:30Z) - Investigating Lexical Replacements for Arabic-English Code-Switched Data
Augmentation [32.885722714728765]
We investigate data augmentation techniques for code-switching (CS) NLP systems.
We perform lexical replacements using word-aligned parallel corpora.
We compare these approaches against dictionary-based replacements.
arXiv Detail & Related papers (2022-05-25T10:44:36Z) - Counterfactual Data Augmentation improves Factuality of Abstractive
Summarization [6.745946263790011]
We show that augmenting the training data with our approach improves the factual correctness of summaries without significantly affecting the ROUGE score.
We show that in two commonly used summarization datasets (CNN/Dailymail and XSum), we improve the factual correctness by about 2.5 points on average.
arXiv Detail & Related papers (2022-05-25T00:00:35Z) - Deep F-measure Maximization for End-to-End Speech Understanding [52.36496114728355]
We propose a differentiable approximation to the F-measure and train the network with this objective using standard backpropagation.
We perform experiments on two standard fairness datasets, Adult, Communities and Crime, and also on speech-to-intent detection on the ATIS dataset and speech-to-image concept classification on the Speech-COCO dataset.
In all four of these tasks, F-measure results in improved micro-F1 scores, with absolute improvements of up to 8% absolute, as compared to models trained with the cross-entropy loss function.
arXiv Detail & Related papers (2020-08-08T03:02:27Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.