Unheard in the Digital Age: Rethinking AI Bias and Speech Diversity
- URL: http://arxiv.org/abs/2601.18641v2
- Date: Thu, 29 Jan 2026 10:22:21 GMT
- Title: Unheard in the Digital Age: Rethinking AI Bias and Speech Diversity
- Authors: Onyedikachi Hope Amaechi-Okorie, Branislav Radeljic,
- Abstract summary: Speech remains one of the most visible yet overlooked vectors of inclusion and exclusion in contemporary society.<n>This article focuses on the structural biases that shape perceptions of atypical speech and are now being encoded into artificial intelligence.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speech remains one of the most visible yet overlooked vectors of inclusion and exclusion in contemporary society. While fluency is often equated with credibility and competence, individuals with atypical speech patterns are routinely marginalized. Given the current state of the debate, this article focuses on the structural biases that shape perceptions of atypical speech and are now being encoded into artificial intelligence. Automated speech recognition (ASR) systems and voice interfaces, trained predominantly on standardized speech, routinely fail to recognize or respond to diverse voices, compounding digital exclusion. As AI technologies increasingly mediate access to opportunity, the study calls for inclusive technological design, anti-bias training to minimize the impact of discriminatory algorithmic decisions, and enforceable policy reform that explicitly recognize speech diversity as a matter of equity, not merely accessibility. Drawing on interdisciplinary research, the article advocates for a cultural and institutional shift in how we value voice, urging co-created solutions that elevate the rights, representation, and realities of atypical speakers in the digital age. Ultimately, the article reframes speech inclusion as a matter of equity (not accommodation) and advocates for co-created AI systems that reflect the full spectrum of human voices.
Related papers
- Do AI Voices Learn Social Nuances? A Case of Politeness and Speech Rate [0.0]
This study investigates whether state-of-the-art text-to-speech systems have the human tendency to reduce speech rate to convey politeness.<n>We prompted 22 synthetic voices from two leading AI platforms to read a fixed script under both "polite and formal" and "casual and informal" conditions.<n>Across both AI platforms, the polite prompt produced slower speech than the casual prompt with very large effect sizes.
arXiv Detail & Related papers (2025-11-12T07:44:42Z) - Towards Inclusive Communication: A Unified Framework for Generating Spoken Language from Sign, Lip, and Audio [52.859261069569165]
We propose the first unified framework capable of handling diverse combinations of sign language, lip movements, and audio for spoken-language text generation.<n>We focus on three main objectives: (i) designing a unified, modality-agnostic architecture capable of effectively processing heterogeneous inputs; (ii) exploring the underexamined synergy among modalities, particularly the role of lip movements as non-manual cues in sign language comprehension; and (iii) achieving performance on par with or better than state-of-the-art models specialized for individual tasks.
arXiv Detail & Related papers (2025-08-28T06:51:42Z) - Fairness of Automatic Speech Recognition: Looking Through a Philosophical Lens [0.42970700836450487]
We argue that systematic misrecognition of certain speech varieties constitutes more than a technical limitation.<n>We identify three unique ethical dimensions of speech technologies that differentiate ASR bias from other algorithmic fairness concerns.
arXiv Detail & Related papers (2025-08-10T02:26:47Z) - Deaf in AI: AI language technologies and the erosion of linguistic rights [0.0]
This paper explores the interplay of AI language technologies, sign language interpreting, and linguistic access.<n>It calls for deaf-led approaches to foster AI systems that remain equitable, inclusive, and trustworthy.
arXiv Detail & Related papers (2025-05-05T09:58:59Z) - "It's not a representation of me": Examining Accent Bias and Digital Exclusion in Synthetic AI Voice Services [3.8931913630405393]
This study evaluates two synthetic AI voice services (Speechify and ElevenLabs) through a mixed methods approach.<n>Our findings reveal technical performance disparities across five regional, English-language accents.<n>Current speech generation technologies may inadvertently reinforce linguistic privilege and accent-based discrimination.
arXiv Detail & Related papers (2025-04-12T21:31:22Z) - Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
In this article, we tackle the challenge of developing ASR systems without paired speech and text corpora.<n>We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.<n>This innovative model surpasses the performance of previous unsupervised ASR models under the lexicon-free setting.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - Speaker Identity Preservation in Dysarthric Speech Reconstruction by
Adversarial Speaker Adaptation [59.41186714127256]
Dysarthric speech reconstruction (DSR) aims to improve the quality of dysarthric speech.
Speaker encoder (SE) optimized for speaker verification has been explored to control the speaker identity.
We propose a novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA)
arXiv Detail & Related papers (2022-02-18T08:59:36Z) - Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies.
This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z) - An Attribute-Aligned Strategy for Learning Speech Representation [57.891727280493015]
We propose an attribute-aligned learning strategy to derive speech representation that can flexibly address these issues by attribute-selection mechanism.
Specifically, we propose a layered-representation variational autoencoder (LR-VAE), which factorizes speech representation into attribute-sensitive nodes.
Our proposed method achieves competitive performances on identity-free SER and a better performance on emotionless SV.
arXiv Detail & Related papers (2021-06-05T06:19:14Z) - Improving Fairness in Speaker Recognition [4.94706680113206]
We investigate the disparity in performance achieved by state-of-the-art deep speaker recognition systems.
We show that models trained with demographically-balanced training sets exhibit a fairer behavior on different groups, while still being accurate.
arXiv Detail & Related papers (2021-04-29T01:08:53Z) - Learning Explicit Prosody Models and Deep Speaker Embeddings for
Atypical Voice Conversion [60.808838088376675]
We propose a VC system with explicit prosodic modelling and deep speaker embedding learning.
A prosody corrector takes in phoneme embeddings to infer typical phoneme duration and pitch values.
A conversion model takes phoneme embeddings and typical prosody features as inputs to generate the converted speech.
arXiv Detail & Related papers (2020-11-03T13:08:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.