Improving Fairness and Robustness in End-to-End Speech Recognition
through unsupervised clustering
- URL: http://arxiv.org/abs/2306.06083v1
- Date: Tue, 6 Jun 2023 21:13:08 GMT
- Title: Improving Fairness and Robustness in End-to-End Speech Recognition
through unsupervised clustering
- Authors: Irina-Elena Veliche, Pascale Fung
- Abstract summary: We present a privacy preserving approach to improve fairness and robustness of end-to-end ASR.
We extract utterance level embeddings using a speaker ID model trained on a public dataset.
We use cluster IDs instead of speaker utterance embeddings as extra features during model training.
- Score: 49.069298478971696
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The challenge of fairness arises when Automatic Speech Recognition (ASR)
systems do not perform equally well for all sub-groups of the population. In
the past few years there have been many improvements in overall speech
recognition quality, but without any particular focus on advancing Equality and
Equity for all user groups for whom systems do not perform well. ASR fairness
is therefore also a robustness issue. Meanwhile, data privacy also takes
priority in production systems. In this paper, we present a privacy preserving
approach to improve fairness and robustness of end-to-end ASR without using
metadata, zip codes, or even speaker or utterance embeddings directly in
training. We extract utterance level embeddings using a speaker ID model
trained on a public dataset, which we then use in an unsupervised fashion to
create acoustic clusters. We use cluster IDs instead of speaker utterance
embeddings as extra features during model training, which shows improvements
for all demographic groups and in particular for different accents.
Related papers
- Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition [18.90193320368228]
We present accent clustering and mining schemes for fair speech recognition systems.
For accent recognition, we applied three schemes to overcome limited size of supervised accent data.
Fine-tuning ASR on the mined Indian accent speech showed 10.0% and 5.3% relative improvements compared to fine-tuning on the randomly sampled speech.
arXiv Detail & Related papers (2024-08-05T16:00:07Z) - Towards Unsupervised Speech Recognition Without Pronunciation Models [57.222729245842054]
Most languages lack sufficient paired speech and text data to effectively train automatic speech recognition systems.
We propose the removal of reliance on a phoneme lexicon to develop unsupervised ASR systems.
We experimentally demonstrate that an unsupervised speech recognizer can emerge from joint speech-to-speech and text-to-text masked token-infilling.
arXiv Detail & Related papers (2024-06-12T16:30:58Z) - Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models [84.8919069953397]
Self-TAught Recognizer (STAR) is an unsupervised adaptation framework for speech recognition systems.
We show that STAR achieves an average of 13.5% relative reduction in word error rate across 14 target domains.
STAR exhibits high data efficiency that only requires less than one-hour unlabeled data.
arXiv Detail & Related papers (2024-05-23T04:27:11Z) - Advancing African-Accented Speech Recognition: Epistemic Uncertainty-Driven Data Selection for Generalizable ASR Models [2.4654745083407175]
We propose a new multi-rounds adaptation process that uses uncertainty to automate the annotation process.
This novel method streamlines data annotation and strategically selects data samples contributing most to model uncertainty.
Our results show that our approach leads to a 27% WER relative average improvement while requiring on average 45% less data than established baselines.
arXiv Detail & Related papers (2023-06-03T13:11:37Z) - Some voices are too common: Building fair speech recognition systems
using the Common Voice dataset [2.28438857884398]
We use the French Common Voice dataset to quantify the biases of a pre-trained wav2vec2.0 model toward several demographic groups.
We also run an in-depth analysis of the Common Voice corpus and identify important shortcomings that should be taken into account.
arXiv Detail & Related papers (2023-06-01T11:42:34Z) - SLICER: Learning universal audio representations using low-resource
self-supervised pre-training [53.06337011259031]
We present a new Self-Supervised Learning approach to pre-train encoders on unlabeled audio data.
Our primary aim is to learn audio representations that can generalize across a large variety of speech and non-speech tasks.
arXiv Detail & Related papers (2022-11-02T23:45:33Z) - Representative Subset Selection for Efficient Fine-Tuning in
Self-Supervised Speech Recognition [6.450618373898492]
We consider the task of identifying an optimal subset of data for efficient fine-tuning in self-supervised speech models for ASR.
We present the COWERAGE algorithm for representative subset selection in self-supervised ASR.
arXiv Detail & Related papers (2022-03-18T10:12:24Z) - WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
Processing [102.45426364965887]
We propose a new pre-trained model, WavLM, to solve full-stack downstream speech tasks.
WavLM is built based on the HuBERT framework, with an emphasis on both spoken content modeling and speaker identity preservation.
We scale up the training dataset from 60k hours to 94k hours of public audio data, and optimize its training procedure for better representation extraction.
arXiv Detail & Related papers (2021-10-26T17:55:19Z) - UniSpeech-SAT: Universal Speech Representation Learning with Speaker
Aware Pre-Training [72.004873454347]
Two methods are introduced for enhancing the unsupervised speaker information extraction.
Experiment results on SUPERB benchmark show that the proposed system achieves state-of-the-art performance.
We scale up training dataset to 94 thousand hours public audio data and achieve further performance improvement.
arXiv Detail & Related papers (2021-10-12T05:43:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.