Improving Fairness in Speaker Recognition
- URL: http://arxiv.org/abs/2104.14067v2
- Date: Fri, 30 Apr 2021 20:36:28 GMT
- Title: Improving Fairness in Speaker Recognition
- Authors: Gianni Fenu, Giacomo Medda, Mirko Marras, and Giacomo Meloni
- Abstract summary: We investigate the disparity in performance achieved by state-of-the-art deep speaker recognition systems.
We show that models trained with demographically-balanced training sets exhibit a fairer behavior on different groups, while still being accurate.
- Score: 4.94706680113206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The human voice conveys unique characteristics of an individual, making voice
biometrics a key technology for verifying identities in various industries.
Despite the impressive progress of speaker recognition systems in terms of
accuracy, a number of ethical and legal concerns has been raised, specifically
relating to the fairness of such systems. In this paper, we aim to explore the
disparity in performance achieved by state-of-the-art deep speaker recognition
systems, when different groups of individuals characterized by a common
sensitive attribute (e.g., gender) are considered. In order to mitigate the
unfairness we uncovered by means of an exploratory study, we investigate
whether balancing the representation of the different groups of individuals in
the training set can lead to a more equal treatment of these demographic
groups. Experiments on two state-of-the-art neural architectures and a
large-scale public dataset show that models trained with
demographically-balanced training sets exhibit a fairer behavior on different
groups, while still being accurate. Our study is expected to provide a solid
basis for instilling beyond-accuracy objectives (e.g., fairness) in speaker
recognition.
Related papers
- Evaluating Speaker Identity Coding in Self-supervised Models and Humans [0.42303492200814446]
Speaker identity plays a significant role in human communication and is being increasingly used in societal applications.
We show that self-supervised representations from different families are significantly better for speaker identification over acoustic representations.
We also show that such a speaker identification task can be used to better understand the nature of acoustic information representation in different layers of these powerful networks.
arXiv Detail & Related papers (2024-06-14T20:07:21Z) - Improving Fairness and Robustness in End-to-End Speech Recognition
through unsupervised clustering [49.069298478971696]
We present a privacy preserving approach to improve fairness and robustness of end-to-end ASR.
We extract utterance level embeddings using a speaker ID model trained on a public dataset.
We use cluster IDs instead of speaker utterance embeddings as extra features during model training.
arXiv Detail & Related papers (2023-06-06T21:13:08Z) - Some voices are too common: Building fair speech recognition systems
using the Common Voice dataset [2.28438857884398]
We use the French Common Voice dataset to quantify the biases of a pre-trained wav2vec2.0 model toward several demographic groups.
We also run an in-depth analysis of the Common Voice corpus and identify important shortcomings that should be taken into account.
arXiv Detail & Related papers (2023-06-01T11:42:34Z) - Benchmark Dataset Dynamics, Bias and Privacy Challenges in Voice
Biometrics Research [1.1160256362224619]
We present a longitudinal study of speaker recognition datasets used for training and evaluation from 2012 to 2021.
Our study identifies the most commonly used datasets in the field, examines their usage patterns, and assesses their attributes that affect bias, fairness, and other ethical concerns.
arXiv Detail & Related papers (2023-04-07T23:05:37Z) - CIAO! A Contrastive Adaptation Mechanism for Non-Universal Facial
Expression Recognition [80.07590100872548]
We propose Contrastive Inhibitory Adaptati On (CIAO), a mechanism that adapts the last layer of facial encoders to depict specific affective characteristics on different datasets.
CIAO presents an improvement in facial expression recognition performance over six different datasets with very unique affective representations.
arXiv Detail & Related papers (2022-08-10T15:46:05Z) - Toward Fairness in Speech Recognition: Discovery and mitigation of
performance disparities [10.917512121301135]
We report on initial findings with both discovery and mitigation of performance disparities using data from a product-scale AI assistant speech recognition system.
For fairness mitigation, we find that oversampling of underrepresented cohorts, as well as modeling speaker cohort membership by additional input variables, reduces the gap between top- and bottom-performing cohorts.
arXiv Detail & Related papers (2022-07-22T21:33:29Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Bias in Automated Speaker Recognition [0.0]
We study bias in the machine learning development workflow of speaker verification, a voice biometric and core task in automated speaker recognition.
We show that bias exists at every development stage in the well-known VoxCeleb Speaker Recognition Challenge.
Most affected are female speakers and non-US nationalities, who experience significant performance degradation.
arXiv Detail & Related papers (2022-01-24T06:48:57Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Measuring Fairness Under Unawareness of Sensitive Attributes: A
Quantification-Based Approach [131.20444904674494]
We tackle the problem of measuring group fairness under unawareness of sensitive attributes.
We show that quantification approaches are particularly suited to tackle the fairness-under-unawareness problem.
arXiv Detail & Related papers (2021-09-17T13:45:46Z) - Disentangled Speech Embeddings using Cross-modal Self-supervision [119.94362407747437]
We develop a self-supervised learning objective that exploits the natural cross-modal synchrony between faces and audio in video.
We construct a two-stream architecture which: (1) shares low-level features common to both representations; and (2) provides a natural mechanism for explicitly disentangling these factors.
arXiv Detail & Related papers (2020-02-20T14:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.