Correcting Automated and Manual Speech Transcription Errors using Warped
Language Models
- URL: http://arxiv.org/abs/2103.14580v1
- Date: Fri, 26 Mar 2021 16:43:23 GMT
- Title: Correcting Automated and Manual Speech Transcription Errors using Warped
Language Models
- Authors: Mahdi Namazifar, John Malik, Li Erran Li, Gokhan Tur, Dilek Hakkani
T\"ur
- Abstract summary: We propose a novel approach that takes advantage of the robustness of warped language models to transcription noise for correcting transcriptions of spoken language.
We show that our proposed approach is able to achieve up to 10% reduction in word error rates of both automatic and manual transcriptions of spoken language.
- Score: 2.8614709576106874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Masked language models have revolutionized natural language processing
systems in the past few years. A recently introduced generalization of masked
language models called warped language models are trained to be more robust to
the types of errors that appear in automatic or manual transcriptions of spoken
language by exposing the language model to the same types of errors during
training. In this work we propose a novel approach that takes advantage of the
robustness of warped language models to transcription noise for correcting
transcriptions of spoken language. We show that our proposed approach is able
to achieve up to 10% reduction in word error rates of both automatic and manual
transcriptions of spoken language.
Related papers
- A two-stage transliteration approach to improve performance of a multilingual ASR [1.9511556030544333]
This paper presents an approach to build a language-agnostic end-to-end model trained on a grapheme set.
We performed experiments with an end-to-end multilingual speech recognition system for two Indic languages.
arXiv Detail & Related papers (2024-10-09T05:30:33Z) - Cross-Lingual Unlearning of Selective Knowledge in Multilingual Language Models [38.10962690551031]
Pretrained language models memorize vast amounts of information, including private and copyrighted data, raising significant safety concerns.
Retraining these models after excluding sensitive data is prohibitively expensive, making machine unlearning a viable, cost-effective alternative.
This paper presents a pioneering approach to machine unlearning for multilingual language models, selectively erasing information across different languages while maintaining overall performance.
arXiv Detail & Related papers (2024-06-18T07:40:18Z) - Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation [82.5217996570387]
We adapt a pre-trained language model for auto-regressive text-to-image generation.
We find that pre-trained language models offer limited help.
arXiv Detail & Related papers (2023-11-27T07:19:26Z) - Multilingual self-supervised speech representations improve the speech
recognition of low-resource African languages with codeswitching [65.74653592668743]
Finetuning self-supervised multilingual representations reduces absolute word error rates by up to 20%.
In circumstances with limited training data finetuning self-supervised representations is a better performing and viable solution.
arXiv Detail & Related papers (2023-11-25T17:05:21Z) - Bridging the Gap Between Training and Inference of Bayesian Controllable
Language Models [58.990214815032495]
Large-scale pre-trained language models have achieved great success on natural language generation tasks.
BCLMs have been shown to be efficient in controllable language generation.
We propose a "Gemini Discriminator" for controllable language generation which alleviates the mismatch problem with a small computational cost.
arXiv Detail & Related papers (2022-06-11T12:52:32Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Frustratingly Easy Edit-based Linguistic Steganography with a Masked
Language Model [21.761511258514673]
We revisit edit-based linguistic steganography, with the idea that a masked language model offers an off-the-shelf solution.
The proposed method eliminates rule construction and has a high payload capacity for an edit-based model.
It is also shown to be more secure against automatic detection than a generation-based method while offering better control of the security/payload capacity trade-off.
arXiv Detail & Related papers (2021-04-20T08:35:53Z) - Improving the Lexical Ability of Pretrained Language Models for
Unsupervised Neural Machine Translation [127.81351683335143]
Cross-lingual pretraining requires models to align the lexical- and high-level representations of the two languages.
Previous research has shown that this is because the representations are not sufficiently aligned.
In this paper, we enhance the bilingual masked language model pretraining with lexical-level information by using type-level cross-lingual subword embeddings.
arXiv Detail & Related papers (2021-03-18T21:17:58Z) - Rnn-transducer with language bias for end-to-end Mandarin-English
code-switching speech recognition [58.105818353866354]
We propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem.
We use the language identities to bias the model to predict the CS points.
This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed.
arXiv Detail & Related papers (2020-02-19T12:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.