Debiased Automatic Speech Recognition for Dysarthric Speech via Sample
Reweighting with Sample Affinity Test
- URL: http://arxiv.org/abs/2305.13108v3
- Date: Tue, 27 Jun 2023 13:19:19 GMT
- Title: Debiased Automatic Speech Recognition for Dysarthric Speech via Sample
Reweighting with Sample Affinity Test
- Authors: Eungbeom Kim, Yunkee Chae, Jaeheon Sim, Kyogu Lee
- Abstract summary: We present a novel approach, sample reweighting with sample affinity test (Re-SAT)
Re-SAT measures the debiasing helpfulness of the given data sample and then mitigates the bias by debiasing helpfulness-based sample reweighting.
Experimental results demonstrate that Re-SAT contributes to improved ASR performance on dysarthric speech without performance degradation on healthy speech.
- Score: 11.223191305716071
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic speech recognition systems based on deep learning are mainly
trained under empirical risk minimization (ERM). Since ERM utilizes the
averaged performance on the data samples regardless of a group such as healthy
or dysarthric speakers, ASR systems are unaware of the performance disparities
across the groups. This results in biased ASR systems whose performance
differences among groups are severe. In this study, we aim to improve the ASR
system in terms of group robustness for dysarthric speakers. To achieve our
goal, we present a novel approach, sample reweighting with sample affinity test
(Re-SAT). Re-SAT systematically measures the debiasing helpfulness of the given
data sample and then mitigates the bias by debiasing helpfulness-based sample
reweighting. Experimental results demonstrate that Re-SAT contributes to
improved ASR performance on dysarthric speech without performance degradation
on healthy speech.
Related papers
- How to Evaluate Automatic Speech Recognition: Comparing Different Performance and Bias Measures [15.722009470067974]
We compare different performance and bias measures, from literature and proposed, to evaluate end-to-end ASR systems for Dutch.<n>The findings reveal that averaged error rates, a standard in ASR research, alone is not sufficient and should be supplemented by other measures.<n>The paper ends with recommendations for reporting ASR performance and bias to better represent a system's performance for diverse speaker groups, and overall system bias.
arXiv Detail & Related papers (2025-07-08T11:17:13Z) - Phrase-Level Adversarial Training for Mitigating Bias in Neural Network-based Automatic Essay Scoring [0.0]
We propose a model-agnostic phrase-level method to generate an adversarial essay set to address the biases and robustness of AES models.
Experimental results show that the proposed approach significantly improves AES model performance in the presence of adversarial examples and scenarios.
arXiv Detail & Related papers (2024-09-07T11:22:35Z) - Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization [34.51491788470738]
We propose reverse inference optimization (RIO) to enhance the robustness of autoregressive-model-based text-to-speech (TTS) systems.
RIO uses reverse inference as the standard to select exemplars used in RLHF from the speech samples generated by the TTS system itself.
RIO significantly improves the stability of zero-shot TTS performance by reducing the discrepancies between training and inference conditions.
arXiv Detail & Related papers (2024-07-02T13:04:04Z) - Crossmodal ASR Error Correction with Discrete Speech Units [16.58209270191005]
We propose a post-ASR processing approach for ASR Error Correction (AEC)
We explore pre-training and fine-tuning strategies and uncover an ASR domain discrepancy phenomenon.
We propose the incorporation of discrete speech units to align with and enhance the word embeddings for improving AEC quality.
arXiv Detail & Related papers (2024-05-26T19:58:38Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - HypR: A comprehensive study for ASR hypothesis revising with a reference corpus [10.173199736362486]
This study focuses on providing an ASR hypothesis revising (HypR) dataset in this study.
HypR contains several commonly used corpora and provides 50 recognition hypotheses for each speech utterance.
In addition, we implement and compare several classic and representative methods, showing the recent research progress in revising speech recognition results.
arXiv Detail & Related papers (2023-09-18T14:55:21Z) - Listen, Adapt, Better WER: Source-free Single-utterance Test-time
Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples.
Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
arXiv Detail & Related papers (2022-03-27T06:38:39Z) - Rethinking Sampling Strategies for Unsupervised Person Re-identification [59.47536050785886]
We analyze the reasons for the performance differences between various sampling strategies under the same framework and loss function.
Group sampling is proposed, which gathers samples from the same class into groups.
Experiments on Market-1501, DukeMTMC-reID and MSMT17 show that group sampling achieves performance comparable to state-of-the-art methods.
arXiv Detail & Related papers (2021-07-07T05:39:58Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Unsupervised Domain Adaptation for Speech Recognition via Uncertainty
Driven Self-Training [55.824641135682725]
Domain adaptation experiments using WSJ as a source domain and TED-LIUM 3 as well as SWITCHBOARD show that up to 80% of the performance of a system trained on ground-truth data can be recovered.
arXiv Detail & Related papers (2020-11-26T18:51:26Z) - Improving noise robust automatic speech recognition with single-channel
time-domain enhancement network [100.1041336974175]
We show that a single-channel time-domain denoising approach can significantly improve ASR performance.
We show that single-channel noise reduction can still improve ASR performance.
arXiv Detail & Related papers (2020-03-09T09:36:31Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.