Boosting Norwegian Automatic Speech Recognition
- URL: http://arxiv.org/abs/2307.01672v1
- Date: Tue, 4 Jul 2023 12:05:15 GMT
- Title: Boosting Norwegian Automatic Speech Recognition
- Authors: Javier de la Rosa, Rolv-Arild Braaten, Per Egil Kummervold, Freddy
Wetjen, Svein Arne Brygfjeld
- Abstract summary: We present several baselines for automatic speech recognition (ASR) models for the two official written languages in Norway: Bokmaal and Nynorsk.
We compare the performance of models of varying sizes and pre-training approaches on multiple Norwegian speech datasets.
We improve the state of the art on the Norwegian Parliamentary Speech Corpus (NPSC) from a word error rate (WER) of 17.10% to 7.60%, with models achieving 5.81% for Bokmaal and 11.54% for Nynorsk.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present several baselines for automatic speech recognition
(ASR) models for the two official written languages in Norway: Bokm{\aa}l and
Nynorsk. We compare the performance of models of varying sizes and pre-training
approaches on multiple Norwegian speech datasets. Additionally, we measure the
performance of these models against previous state-of-the-art ASR models, as
well as on out-of-domain datasets. We improve the state of the art on the
Norwegian Parliamentary Speech Corpus (NPSC) from a word error rate (WER) of
17.10\% to 7.60\%, with models achieving 5.81\% for Bokm{\aa}l and 11.54\% for
Nynorsk. We also discuss the challenges and potential solutions for further
improving ASR models for Norwegian.
Related papers
- Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking [68.77659513993507]
We present a simple and effective N-best re-ranking approach to improve multilingual ASR accuracy.
Our results show spoken language identification accuracy improvements of 8.7% and 6.1%, respectively, and word error rates which are 3.3% and 2.0% lower on these benchmarks.
arXiv Detail & Related papers (2024-09-27T03:31:32Z) - Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer [59.57249127943914]
We present a multilingual Audio-Visual Speech Recognition model incorporating several enhancements to improve performance and audio noise robustness.
We increase the amount of audio-visual training data for six distinct languages, generating automatic transcriptions of unlabelled multilingual datasets.
Our proposed model achieves new state-of-the-art performance on the LRS3 dataset, reaching WER of 0.8%.
arXiv Detail & Related papers (2024-03-14T01:16:32Z) - Whispering in Norwegian: Navigating Orthographic and Dialectic
Challenges [0.2984347156162651]
This article introduces NB-Whisper, an adaptation of OpenAI's Whisper, specifically fine-tuned for Norwegian language Automatic Speech Recognition (ASR)
We highlight its key contributions and summarise the results achieved in converting spoken Norwegian into written forms and translating other languages into Norwegian.
arXiv Detail & Related papers (2024-02-02T21:38:12Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - Annotating Norwegian Language Varieties on Twitter for Part-of-Speech [14.031720101413557]
We present a novel Norwegian Twitter dataset annotated with POS-tags.
We show that models trained on Universal Dependency (UD) data perform worse when evaluated against this dataset.
We also see that performance on dialectal tweets is comparable to the written standards for some models.
arXiv Detail & Related papers (2022-10-12T12:53:30Z) - Operationalizing a National Digital Library: The Case for a Norwegian
Transformer Model [0.0]
We show the process of building a large-scale training set from digital and digitized collections at a national library.
The resulting Bidirectional Representations from Transformers (BERT)-based language model for Norwegian outperforms multilingual BERT (mBERT) models in several token and sequence classification tasks.
arXiv Detail & Related papers (2021-04-19T20:36:24Z) - NorDial: A Preliminary Corpus of Written Norwegian Dialect Use [4.211128681972148]
We collect a small corpus of tweets and manually annotate them as Bokmaal, Nynorsk, any dialect, or a mix.
We perform preliminary experiments with state-of-the-art models, as well as an analysis of the data to expand this corpus in the future.
arXiv Detail & Related papers (2021-04-11T10:56:53Z) - Unsupervised Cross-lingual Representation Learning for Speech
Recognition [63.85924123692923]
XLSR learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
We build on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations.
Experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining.
arXiv Detail & Related papers (2020-06-24T18:25:05Z) - ContextNet: Improving Convolutional Neural Networks for Automatic Speech
Recognition with Global Context [58.40112382877868]
We propose a novel CNN-RNN-transducer architecture, which we call ContextNet.
ContextNet features a fully convolutional encoder that incorporates global context information into convolution layers by adding squeeze-and-excitation modules.
We demonstrate that ContextNet achieves a word error rate (WER) of 2.1%/4.6% without external language model (LM), 1.9%/4.1% with LM and 2.9%/7.0% with only 10M parameters on the clean/noisy LibriSpeech test sets.
arXiv Detail & Related papers (2020-05-07T01:03:18Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.