You don't understand me!: Comparing ASR results for L1 and L2 speakers of Swedish
- URL: http://arxiv.org/abs/2405.13379v1
- Date: Wed, 22 May 2024 06:24:55 GMT
- Title: You don't understand me!: Comparing ASR results for L1 and L2 speakers of Swedish
- Authors: Ronald Cumbal, Birger Moell, Jose Lopes, Olof Engwall,
- Abstract summary: We focus on the gap in performance between recognition results for native and non-native, read and spontaneous, Swedish utterances transcribed by different ASR services.
We compare the recognition results using Word Error Rate and analyze the linguistic factors that may generate the observed transcription errors.
- Score: 0.5249805590164903
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The performance of Automatic Speech Recognition (ASR) systems has constantly increased in state-of-the-art development. However, performance tends to decrease considerably in more challenging conditions (e.g., background noise, multiple speaker social conversations) and with more atypical speakers (e.g., children, non-native speakers or people with speech disorders), which signifies that general improvements do not necessarily transfer to applications that rely on ASR, e.g., educational software for younger students or language learners. In this study, we focus on the gap in performance between recognition results for native and non-native, read and spontaneous, Swedish utterances transcribed by different ASR services. We compare the recognition results using Word Error Rate and analyze the linguistic factors that may generate the observed transcription errors.
Related papers
- Measuring the Accuracy of Automatic Speech Recognition Solutions [4.99320937849508]
Automatic Speech Recognition (ASR) is now a part of many popular applications.
We measured the performance of eleven common ASR services with recordings of Higher Education lectures.
Our results show that accuracy ranges widely between vendors and for the individual audio samples.
We also measured a significant lower quality for streaming ASR, which is used for live events.
arXiv Detail & Related papers (2024-08-29T06:38:55Z) - Towards interfacing large language models with ASR systems using confidence measures and prompting [54.39667883394458]
This work investigates post-hoc correction of ASR transcripts with large language models (LLMs)
To avoid introducing errors into likely accurate transcripts, we propose a range of confidence-based filtering methods.
Our results indicate that this can improve the performance of less competitive ASR systems.
arXiv Detail & Related papers (2024-07-31T08:00:41Z) - Reexamining Racial Disparities in Automatic Speech Recognition Performance: The Role of Confounding by Provenance [7.882996636086014]
It is important that automatic speech recognition (ASR) models and their use is fair and equitable.
The current study seeks to understand the factors underlying this disparity by examining the performance of the current state-of-the-art neural network based ASR system.
arXiv Detail & Related papers (2024-07-19T02:14:17Z) - Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
Most languages lack sufficient paired speech and text data to effectively train automatic speech recognition systems.
We propose the removal of reliance on a phoneme lexicon to develop unsupervised ASR systems.
We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - Towards ASR Robust Spoken Language Understanding Through In-Context
Learning With Word Confusion Networks [68.79880423713597]
We introduce a method that utilizes the ASR system's lattice output instead of relying solely on the top hypothesis.
Our in-context learning experiments, covering spoken question answering and intent classification, underline the LLM's resilience to noisy speech transcripts.
arXiv Detail & Related papers (2024-01-05T17:58:10Z) - Investigating the Sensitivity of Automatic Speech Recognition Systems to
Phonetic Variation in L2 Englishes [3.198144010381572]
This work demonstrates a method of probing an ASR system to discover how it handles phonetic variation across a number of L2 Englishes.
It is demonstrated that the behaviour of the ASR is systematic and consistent across speakers with similar spoken varieties.
arXiv Detail & Related papers (2023-05-12T11:29:13Z) - Language Dependencies in Adversarial Attacks on Speech Recognition
Systems [0.0]
We compare the attackability of a German and an English ASR system.
We investigate if one of the language models is more susceptible to manipulations than the other.
arXiv Detail & Related papers (2022-02-01T13:27:40Z) - A study on native American English speech recognition by Indian
listeners with varying word familiarity level [62.14295630922855]
We have three kinds of responses from each listener while they recognize an utterance.
From these transcriptions, word error rate (WER) is calculated and used as a metric to evaluate the similarity between the recognized and the original sentences.
Speaker nativity wise analysis shows that utterances from speakers of some nativity are more difficult to recognize by Indian listeners compared to few other nativities.
arXiv Detail & Related papers (2021-12-08T07:43:38Z) - Quantifying Bias in Automatic Speech Recognition [28.301997555189462]
This paper quantifies the bias of a Dutch SotA ASR system against gender, age, regional accents and non-native accents.
Based on our findings, we suggest bias mitigation strategies for ASR development.
arXiv Detail & Related papers (2021-03-28T12:52:03Z) - LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition [148.43282526983637]
We develop LRSpeech, a TTS and ASR system for languages with low data cost.
We conduct experiments on an experimental language (English) and a truly low-resource language (Lithuanian) to verify the effectiveness of LRSpeech.
We are currently deploying LRSpeech into a commercialized cloud speech service to support TTS on more rare languages.
arXiv Detail & Related papers (2020-08-09T08:16:33Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.