Related papers: The Voice of Equity: A Systematic Evaluation of Bias Mitigation Techniques for Speech-Based Cognitive Impairment Detection Across Architectures and Demographics

The Voice of Equity: A Systematic Evaluation of Bias Mitigation Techniques for Speech-Based Cognitive Impairment Detection Across Architectures and Demographics

URL: http://arxiv.org/abs/2601.16989v1
Date: Wed, 07 Jan 2026 11:47:24 GMT
Title: The Voice of Equity: A Systematic Evaluation of Bias Mitigation Techniques for Speech-Based Cognitive Impairment Detection Across Architectures and Demographics
Authors: Yasaman Haghbin, Sina Rashidi, Ali Zolnour, Maryam Zolnoori,
Abstract summary: We present the first comprehensive fairness analysis framework for speech-based cognitive impairment detection.<n>We developed two transformer-based architectures, SpeechCARE-AGF and Whisper-LWF-LoRA, on the multilingual NIA PREPARE Challenge dataset.
Score: 1.3549498237473223
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Speech-based detection of cognitive impairment offers a scalable, non-invasive screening, yet algorithmic bias across demographic and linguistic subgroups remains critically underexplored. We present the first comprehensive fairness analysis framework for speech-based multi-class cognitive impairment detection, systematically evaluating bias mitigation across architectures, and demographic subgroups. We developed two transformer-based architectures, SpeechCARE-AGF and Whisper-LWF-LoRA, on the multilingual NIA PREPARE Challenge dataset. Unlike prior work that typically examines single mitigation techniques, we compared pre-processing, in-processing, and post-processing approaches, assessing fairness via Equality of Opportunity and Equalized Odds across gender, age, education, and language. Both models achieved strong performance (F1: SpeechCARE-AGF 70.87, Whisper-LWF-LoRA 71.46) but exhibited substantial fairness disparities. Adults >=80 showed lower sensitivity versus younger groups; Spanish speakers demonstrated reduced TPR versus English speakers. Mitigation effectiveness varied by architecture: oversampling improved SpeechCARE-AGF for older adults (80+ TPR: 46.19%=>49.97%) but minimally affected Whisper-LWF-LoRA. This study addresses a critical healthcare AI gap by demonstrating that architectural design fundamentally shapes bias patterns and mitigation effectiveness. Adaptive fusion mechanisms enable flexible responses to data interventions, while frequency reweighting offers robust improvements across architectures. Our findings establish that fairness interventions must be tailored to both model architecture and demographic characteristics, providing a systematic framework for developing equitable speech-based screening tools essential for reducing diagnostic disparities in cognitive healthcare.

Related papers

Bias and Fairness in Self-Supervised Acoustic Representations for Cognitive Impairment Detection [31.057972486149268]
Speech-based detection of cognitive impairment (CI) offers a promising non-invasive approach for early diagnosis.<n>This study presents a systematic bias analysis of acoustic-based CI and depression classification using the DementiaBank Pitt Corpus.
arXiv Detail & Related papers (2026-03-03T12:47:31Z)
National Institute on Aging PREPARE Challenge: Early Detection of Cognitive Impairment Using Speech -- The SpeechCARE Solution [1.0486773259892048]
Alzheimer's disease and related dementias affect one in five adults over 60, yet more than half of individuals with cognitive decline remain undiagnosed.<n>SpeechCARE is a multimodal speech processing pipeline that captures subtle speech-related cues associated with cognitive impairment.<n>Its robust preprocessing includes automatic transcription, large language model (LLM)-based anomaly detection, and task identification.
arXiv Detail & Related papers (2025-11-11T11:39:20Z)
Understanding Textual Capability Degradation in Speech LLMs via Parameter Importance Analysis [54.53152524778821]
integration of speech into Large Language Models (LLMs) has substantially expanded their capabilities, but often at the cost of weakening their core textual competence.<n>We propose an analytical framework based on parameter importance estimation, which reveals that fine-tuning for speech introduces a textual importance distribution shift.<n>We investigate two mitigation strategies: layer-wise learning rate scheduling and Low-Rank Adaptation (LoRA)<n> Experimental results show that both approaches better maintain textual competence than full fine-tuning, while also improving downstream spoken question answering performance.
arXiv Detail & Related papers (2025-09-28T09:04:40Z)
Evaluating and Mitigating Bias in AI-Based Medical Text Generation [35.24191727599811]
AI systems may reflect and amplify human bias, and reduce the quality of their performance in historically under-served populations.<n>In this study, we investigate the fairness problem in text generation within the medical field.<n>We propose an algorithm that selectively optimize those underperformed groups to reduce bias.
arXiv Detail & Related papers (2025-04-24T06:10:40Z)
$C^2$AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction [80.57232374640911]
We propose a model-agnostic strategy called the Mask-And-Recover (MAR)<n>MAR integrates both inter- and intra-modality contextual correlations to enable global inference within extraction modules.<n>To better target challenging parts within each sample, we introduce a Fine-grained Confidence Score (FCS) model.
arXiv Detail & Related papers (2025-04-01T13:01:30Z)
Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation [71.31331402404662]
This paper proposes two novel data-efficient methods to learn dysarthric and elderly speaker-level features. Speaker-regularized spectral basis embedding-SBE features that exploit a special regularization term to enforce homogeneity of speaker features in adaptation. Feature-based learning hidden unit contributions (f-LHUC) that are conditioned on VR-LH features that are shown to be insensitive to speaker-level data quantity in testtime adaptation.
arXiv Detail & Related papers (2024-07-08T18:20:24Z)
A Systematic Evaluation of Adversarial Attacks against Speech Emotion Recognition Models [6.854732863866882]
Speech emotion recognition (SER) is constantly gaining attention in recent years due to its potential applications in diverse fields. Recent studies have shown that deep learning models can be vulnerable to adversarial attacks.
arXiv Detail & Related papers (2024-04-29T09:00:32Z)
Automatic Severity Classification of Dysarthric speech by using Self-supervised Model with Multi-task Learning [4.947423926765435]
We propose a novel automatic severity assessment method for dysarthric speech using the self-supervised model in conjunction with multi-task learning. Wav2vec 2.0 XLS-R is trained for two different tasks: severity classification and auxiliary automatic speech recognition (ASR) Our model outperforms the traditional baseline methods, with a relative percentage increase of 1.25% for F1-score.
arXiv Detail & Related papers (2022-10-27T12:48:10Z)
Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition [55.25565305101314]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems. This paper presents a cross-domain and cross-lingual A2A inversion approach that utilizes the parallel audio and ultrasound tongue imaging (UTI) data of the 24-hour TaL corpus in A2A model pre-training. Experiments conducted on three tasks suggested incorporating the generated articulatory features consistently outperformed the baseline TDNN and Conformer ASR systems.
arXiv Detail & Related papers (2022-06-15T07:20:28Z)
On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition [53.17176024917725]
Scarcity of speaker-level data limits the practical use of data-intensive model based speaker adaptation methods. This paper proposes two novel forms of data-efficient, feature-based on-the-fly speaker adaptation methods.
arXiv Detail & Related papers (2022-03-28T09:12:24Z)
Recent Progress in the CUHK Dysarthric Speech Recognition System [66.69024814159447]
Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based automatic speech recognition technologies. This paper presents recent research efforts at the Chinese University of Hong Kong to improve the performance of disordered speech recognition systems.
arXiv Detail & Related papers (2022-01-15T13:02:40Z)
Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition [65.25325641528701]
Motivated by the spectro-temporal level differences between disordered and normal speech that systematically manifest in articulatory imprecision, decreased volume and clarity, slower speaking rates and increased dysfluencies, novel spectro-temporal subspace basis embedding deep features derived by SVD decomposition of speech spectrum are proposed. Experiments conducted on the UASpeech corpus suggest the proposed spectro-temporal deep feature adapted systems consistently outperformed baseline i- adaptation by up to 263% absolute (8.6% relative) reduction in word error rate (WER) with or without data augmentation.
arXiv Detail & Related papers (2022-01-14T16:56:43Z)
Model-Based Approach for Measuring the Fairness in ASR [11.076999352942954]
We introduce mixed-effects Poisson regression to better measure and interpret any WER difference among subgroups of interest. We demonstrate the validity of proposed model-based approach on both synthetic and real-world speech data.
arXiv Detail & Related papers (2021-09-19T05:24:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.