Related papers: SCDF: A Speaker Characteristics DeepFake Speech Dataset for Bias Analysis

SCDF: A Speaker Characteristics DeepFake Speech Dataset for Bias Analysis

URL: http://arxiv.org/abs/2508.07944v1
Date: Mon, 11 Aug 2025 12:58:37 GMT
Title: SCDF: A Speaker Characteristics DeepFake Speech Dataset for Bias Analysis
Authors: Vojtěch Staněk, Karel Srna, Anton Firc, Kamil Malinka,
Abstract summary: Speaker Characteristics Deepfake dataset contains over 237,000 utterances in a balanced representation of both male and female speakers.<n>We show that speaker characteristics significantly influence detection performance, revealing disparities across sex, language, age, and synthesizer type.<n>These findings highlight the need for bias-aware development and provide a foundation for building non-discriminatory deepfake detection systems.
Score: 1.2499537119440245
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite growing attention to deepfake speech detection, the aspects of bias and fairness remain underexplored in the speech domain. To address this gap, we introduce the Speaker Characteristics Deepfake (SCDF) dataset: a novel, richly annotated resource enabling systematic evaluation of demographic biases in deepfake speech detection. SCDF contains over 237,000 utterances in a balanced representation of both male and female speakers spanning five languages and a wide age range. We evaluate several state-of-the-art detectors and show that speaker characteristics significantly influence detection performance, revealing disparities across sex, language, age, and synthesizer type. These findings highlight the need for bias-aware development and provide a foundation for building non-discriminatory deepfake detection systems aligned with ethical and regulatory standards.

Related papers

Causally Disentangled Contrastive Learning for Multilingual Speaker Embeddings [0.0]
This paper investigates the extent to which demographic information, specifically gender, age, and accent, is present in SimCLR-trained speaker embeddings.<n>We study two debiasing strategies: adversarial training through gradient reversal and a causal bottleneck architecture.
arXiv Detail & Related papers (2026-02-01T18:02:15Z)
Speak Your Mind: The Speech Continuation Task as a Probe of Voice-Based Model Bias [24.932603485660323]
Speech Continuation (SC) is the task of generating a coherent extension of a spoken prompt while preserving semantic context and speaker identity.<n>We present the first systematic evaluation of bias in SC, investigating how gender and phonation type affect continuation behaviour.
arXiv Detail & Related papers (2025-09-26T08:43:25Z)
SpeechRole: A Large-Scale Dataset and Benchmark for Evaluating Speech Role-Playing Agents [52.29009595100625]
Role-playing agents have emerged as a promising paradigm for achieving personalized interaction and emotional resonance.<n>Existing research primarily focuses on the textual modality, neglecting the critical dimension of speech in realistic interactive scenarios.<n>We construct SpeechRole-Data, a large-scale, high-quality dataset that comprises 98 diverse roles and 112k speech-based single-turn and multi-turn conversations.
arXiv Detail & Related papers (2025-08-04T03:18:36Z)
AudioJudge: Understanding What Works in Large Audio Model Based Speech Evaluation [55.607230723223346]
This work presents a systematic study of Large Audio Model (LAM) as a Judge, AudioJudge, investigating whether it can provide a unified evaluation framework that addresses both challenges.<n>We explore AudioJudge across audio characteristic detection tasks, including pronunciation, speaking rate, speaker identification and speech quality, and system-level human preference simulation for automated benchmarking.<n>We introduce a multi-aspect ensemble AudioJudge to enable general-purpose multi-aspect audio evaluation. This method decomposes speech assessment into specialized judges for lexical content, speech quality, and paralinguistic features, achieving up to 0.91 Spearman correlation with human preferences on
arXiv Detail & Related papers (2025-07-17T00:39:18Z)
Anomaly Detection and Localization for Speech Deepfakes via Feature Pyramid Matching [8.466707742593078]
Speech deepfakes are synthetic audio signals that can imitate target speakers' voices.<n>Existing methods for detecting speech deepfakes rely on supervised learning.<n>We introduce a novel interpretable one-class detection framework, which reframes speech deepfake detection as an anomaly detection task.
arXiv Detail & Related papers (2025-03-23T11:15:22Z)
Where are we in audio deepfake detection? A systematic analysis over generative and detection models [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.<n>It provides a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.<n>It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs) By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z)
Residual Information in Deep Speaker Embedding Architectures [4.619541348328938]
This paper introduces an analysis over six sets of speaker embeddings extracted with some of the most recent and high-performing DNN architectures. The dataset includes 46 speakers uttering the same set of prompts, recorded in either a professional studio or their home environments. The results show that the discriminative power of the analyzed embeddings is very high, yet across all the analyzed architectures, residual information is still present in the representations.
arXiv Detail & Related papers (2023-02-06T12:37:57Z)
Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition [48.33873602050463]
Speaker adaptation techniques play a key role in personalization of ASR systems for such users. Motivated by the spectro-temporal level differences between dysarthric, elderly and normal speech. Novel spectrotemporal subspace basis deep embedding features derived using SVD speech spectrum.
arXiv Detail & Related papers (2022-02-21T15:11:36Z)
Bias in Automated Speaker Recognition [0.0]
We study bias in the machine learning development workflow of speaker verification, a voice biometric and core task in automated speaker recognition. We show that bias exists at every development stage in the well-known VoxCeleb Speaker Recognition Challenge. Most affected are female speakers and non-US nationalities, who experience significant performance degradation.
arXiv Detail & Related papers (2022-01-24T06:48:57Z)
Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition [65.25325641528701]
Motivated by the spectro-temporal level differences between disordered and normal speech that systematically manifest in articulatory imprecision, decreased volume and clarity, slower speaking rates and increased dysfluencies, novel spectro-temporal subspace basis embedding deep features derived by SVD decomposition of speech spectrum are proposed. Experiments conducted on the UASpeech corpus suggest the proposed spectro-temporal deep feature adapted systems consistently outperformed baseline i- adaptation by up to 263% absolute (8.6% relative) reduction in word error rate (WER) with or without data augmentation.
arXiv Detail & Related papers (2022-01-14T16:56:43Z)
Improving Fairness in Speaker Recognition [4.94706680113206]
We investigate the disparity in performance achieved by state-of-the-art deep speaker recognition systems. We show that models trained with demographically-balanced training sets exhibit a fairer behavior on different groups, while still being accurate.
arXiv Detail & Related papers (2021-04-29T01:08:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.