Benchmark Dataset Dynamics, Bias and Privacy Challenges in Voice
Biometrics Research
- URL: http://arxiv.org/abs/2304.03858v4
- Date: Fri, 18 Aug 2023 08:05:24 GMT
- Title: Benchmark Dataset Dynamics, Bias and Privacy Challenges in Voice
Biometrics Research
- Authors: Casandra Rusti, Anna Leschanowsky, Carolyn Quinlan, Michaela Pnacek,
Lauriane Gorce, Wiebke Hutiri
- Abstract summary: We present a longitudinal study of speaker recognition datasets used for training and evaluation from 2012 to 2021.
Our study identifies the most commonly used datasets in the field, examines their usage patterns, and assesses their attributes that affect bias, fairness, and other ethical concerns.
- Score: 1.1160256362224619
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speaker recognition is a widely used voice-based biometric technology with
applications in various industries, including banking, education, recruitment,
immigration, law enforcement, healthcare, and well-being. However, while
dataset evaluations and audits have improved data practices in face recognition
and other computer vision tasks, the data practices in speaker recognition have
gone largely unquestioned. Our research aims to address this gap by exploring
how dataset usage has evolved over time and what implications this has on bias,
fairness and privacy in speaker recognition systems. Previous studies have
demonstrated the presence of historical, representation, and measurement biases
in popular speaker recognition benchmarks. In this paper, we present a
longitudinal study of speaker recognition datasets used for training and
evaluation from 2012 to 2021. We survey close to 700 papers to investigate
community adoption of datasets and changes in usage over a crucial time period
where speaker recognition approaches transitioned to the widespread adoption of
deep neural networks. Our study identifies the most commonly used datasets in
the field, examines their usage patterns, and assesses their attributes that
affect bias, fairness, and other ethical concerns. Our findings suggest areas
for further research on the ethics and fairness of speaker recognition
technology.
Related papers
- Considerations for Ethical Speech Recognition Datasets [0.799536002595393]
We use automatic speech recognition as a case study and examine the properties that ethical speech datasets should possess towards responsible AI applications.
We showcase diversity issues, inclusion practices, and necessary considerations that can improve trained models.
We argue for the legal & privacy protection of data subjects, targeted data sampling corresponding to user demographics & needs, appropriate meta data that ensure explainability & accountability in cases of model failure.
arXiv Detail & Related papers (2023-05-03T12:38:14Z) - Evaluating generative audio systems and their metrics [80.97828572629093]
This paper investigates state-of-the-art approaches side-by-side with (i) a set of previously proposed objective metrics for audio reconstruction, and (ii) a listening study.
Results indicate that currently used objective metrics are insufficient to describe the perceptual quality of current systems.
arXiv Detail & Related papers (2022-08-31T21:48:34Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - Bias in Automated Speaker Recognition [0.0]
We study bias in the machine learning development workflow of speaker verification, a voice biometric and core task in automated speaker recognition.
We show that bias exists at every development stage in the well-known VoxCeleb Speaker Recognition Challenge.
Most affected are female speakers and non-US nationalities, who experience significant performance degradation.
arXiv Detail & Related papers (2022-01-24T06:48:57Z) - Improving Fairness in Speaker Recognition [4.94706680113206]
We investigate the disparity in performance achieved by state-of-the-art deep speaker recognition systems.
We show that models trained with demographically-balanced training sets exhibit a fairer behavior on different groups, while still being accurate.
arXiv Detail & Related papers (2021-04-29T01:08:53Z) - Deep Gait Recognition: A Survey [15.47582611826366]
Gait recognition is an appealing biometric modality which aims to identify individuals based on the way they walk.
Deep learning has reshaped the research landscape in this area since 2015 through the ability to automatically learn discriminative representations.
We present a comprehensive overview of breakthroughs and recent developments in gait recognition with deep learning.
arXiv Detail & Related papers (2021-02-18T18:49:28Z) - Few Shot Text-Independent speaker verification using 3D-CNN [0.0]
We have proposed a novel method to verify the identity of the claimed speaker using very few training data.
Experiments conducted on the VoxCeleb1 dataset show that the proposed model accuracy even on training with very few data is near to the state of the art model on text-independent speaker verification.
arXiv Detail & Related papers (2020-08-25T15:03:29Z) - An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and
Separation [57.68765353264689]
Speech enhancement and speech separation are two related tasks.
Traditionally, these tasks have been tackled using signal processing and machine learning techniques.
Deep learning has been exploited to achieve strong performance.
arXiv Detail & Related papers (2020-08-21T17:24:09Z) - Survey on the Analysis and Modeling of Visual Kinship: A Decade in the
Making [66.72253432908693]
Kinship recognition is a challenging problem with many practical applications.
We review the public resources and data challenges that enabled and inspired many to hone-in on the views.
For the tenth anniversary, the demo code is provided for the various kin-based tasks.
arXiv Detail & Related papers (2020-06-29T13:25:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.