Related papers: Automatic Speech Recognition Biases in Newcastle English: an Error Analysis

Automatic Speech Recognition Biases in Newcastle English: an Error Analysis

URL: http://arxiv.org/abs/2506.16558v1
Date: Thu, 19 Jun 2025 19:24:12 GMT
Title: Automatic Speech Recognition Biases in Newcastle English: an Error Analysis
Authors: Dana Serditova, Kevin Tang, Jochen Steffens,
Abstract summary: This study investigates ASR performance on Newcastle English, a well-documented regional dialect.<n>A two-stage analysis was conducted: first, a manual error analysis on a subsample identified key phonological, lexical, and morphosyntactic errors behind ASR misrecognitions.<n>Results show that ASR errors directly correlate with regional dialectal features, while social factors play a lesser role in ASR mismatches.
Score: 0.8602553195689513
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Automatic Speech Recognition (ASR) systems struggle with regional dialects due to biased training which favours mainstream varieties. While previous research has identified racial, age, and gender biases in ASR, regional bias remains underexamined. This study investigates ASR performance on Newcastle English, a well-documented regional dialect known to be challenging for ASR. A two-stage analysis was conducted: first, a manual error analysis on a subsample identified key phonological, lexical, and morphosyntactic errors behind ASR misrecognitions; second, a case study focused on the systematic analysis of ASR recognition of the regional pronouns ``yous'' and ``wor''. Results show that ASR errors directly correlate with regional dialectal features, while social factors play a lesser role in ASR mismatches. We advocate for greater dialectal diversity in ASR training data and highlight the value of sociolinguistic analysis in diagnosing and addressing regional biases.

Related papers

WESR: Scaling and Evaluating Word-level Event-Speech Recognition [59.21814194620928]
Speech conveys not only linguistic information but also rich non-verbal vocal events such as laughing and crying.<n>We develop a refined taxonomy of 21 vocal events, with a new categorization into discrete (standalone) versus continuous (mixed with speech) types.<n>Based on the refined taxonomy, we introduce WESR-Bench, an expert-annotated evaluation set (900+ utterances) with a novel position-aware protocol.
arXiv Detail & Related papers (2026-01-08T02:23:21Z)
A Sociophonetic Analysis of Racial Bias in Commercial ASR Systems Using the Pacific Northwest English Corpus [6.361208877327219]
This paper presents a systematic evaluation of racial bias in four major commercial automatic speech recognition (ASR) systems.<n>We analyze transcription accuracy across speakers from four ethnic backgrounds (African American, Caucasian American, ChicanX, and Yakama) and examine how sociophonetic variation contributes to differential system performance.
arXiv Detail & Related papers (2025-10-26T02:19:40Z)
Not All Errors Are Equal: Investigation of Speech Recognition Errors in Alzheimer's Disease Detection [62.942077348224046]
Speech recognition plays an important role in automatic detection of Alzheimer's disease (AD)<n>Recent studies have revealed a non-linear relationship between word error rates (WER) and AD detection performance.<n>This work presents a series of analyses to explore the effect of ASR transcription errors in BERT-based AD detection systems.
arXiv Detail & Related papers (2024-12-09T09:32:20Z)
Advocating Character Error Rate for Multilingual ASR Evaluation [1.2597747768235845]
We document the limitations of the word error rate (WER) as an evaluation metric and advocate for the character error rate (CER) as the primary metric. We show that CER avoids many of the challenges WER faces and exhibits greater consistency across writing systems. Our findings suggest that CER should be prioritized, or at least supplemented, in multilingual ASR evaluations to account for the varying linguistic characteristics of different languages.
arXiv Detail & Related papers (2024-10-09T19:57:07Z)
Reexamining Racial Disparities in Automatic Speech Recognition Performance: The Role of Confounding by Provenance [7.882996636086014]
It is important that automatic speech recognition (ASR) models and their use is fair and equitable. The current study seeks to understand the factors underlying this disparity by examining the performance of the current state-of-the-art neural network based ASR system.
arXiv Detail & Related papers (2024-07-19T02:14:17Z)
DeSIQ: Towards an Unbiased, Challenging Benchmark for Social Intelligence Understanding [60.84356161106069]
We study the soundness of Social-IQ, a dataset of multiple-choice questions on videos of complex social interactions. Our analysis reveals that Social-IQ contains substantial biases, which can be exploited by a moderately strong language model. We introduce DeSIQ, a new challenging dataset, constructed by applying simple perturbations to Social-IQ.
arXiv Detail & Related papers (2023-10-24T06:21:34Z)
Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z)
Investigating the Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes [3.198144010381572]
This work demonstrates a method of probing an ASR system to discover how it handles phonetic variation across a number of L2 Englishes. It is demonstrated that the behaviour of the ASR is systematic and consistent across speakers with similar spoken varieties.
arXiv Detail & Related papers (2023-05-12T11:29:13Z)
A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings [53.120885867427305]
Three approaches are evaluated for speaker-attributed automatic speech recognition (SA-ASR) in a meeting scenario. The WD-SOT approach achieves 10.7% relative reduction on averaged speaker-dependent character error rate (SD-CER) The TS-ASR approach also outperforms the FD-SOT approach and brings 16.5% relative average SD-CER reduction.
arXiv Detail & Related papers (2022-03-31T06:39:14Z)
Language Dependencies in Adversarial Attacks on Speech Recognition Systems [0.0]
We compare the attackability of a German and an English ASR system. We investigate if one of the language models is more susceptible to manipulations than the other.
arXiv Detail & Related papers (2022-02-01T13:27:40Z)
Sequence-level self-learning with multiple hypotheses [53.04725240411895]
We develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR) In contrast to conventional unsupervised learning approaches, we adopt the emphmulti-task learning (MTL) framework. Our experiment results show that our method can reduce the WER on the British speech data from 14.55% to 10.36% compared to the baseline model trained with the US English data only.
arXiv Detail & Related papers (2021-12-10T20:47:58Z)
Quantifying Bias in Automatic Speech Recognition [28.301997555189462]
This paper quantifies the bias of a Dutch SotA ASR system against gender, age, regional accents and non-native accents. Based on our findings, we suggest bias mitigation strategies for ASR development.
arXiv Detail & Related papers (2021-03-28T12:52:03Z)
Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU) We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.